As you may know, transparency is one of the key values at Buffer. We’re striving to be fully open about how Buffer and how our happiness team is doing. As we’ve finished the first month of 2014, I’m happy to introduce the first monthly Engineering Report at Buffer!
As a key value of Buffer, we want to strive to be completely open within our engineering team. That means creating monthly reports like these, a renewed focus on contributing to the amazing software community through open source, and more blog posts about interesting challenges and approaches from the team.
Reliability & Security
At our core, we’re striving to make Buffer the most reliable tool for social media on the web. We’ve got a long way to go with this and it’s something we’ll always work towards. We also believe that nothing is more important that your trust. That’s why we’re striving to ensure that Buffer remains a secure platform that you can feel safe using.
Building a great engineering culture
It’s amazing to see how quickly Buffer is growing. With that, we want to ensure that 2014 is a great year for us to expand our engineering team. Since we’re all distributed and located all over the world, it’s important that our team develops a great sense of trust and cohesiveness. It’s amazing to know that with the current growth rate, we’re hoping to be a team of 20 by the end of 2014.
How did we do in January?
Overall January was such an amazing month.
We’ve had over 300 amazing people apply for an engineering position in January. After releasing our salaries to the public, the number of applicants who are potential culture fits dramatically increased! I’ve been incrediblty blessed and it has been such a pleasure to read through emails from such awesome candidates every week.
It’s also the first time that we’ve had more than 4 engineers at once on the team! In January, we’ve added 2 hackers to the Buffer Bootcamp, and have made 2 more offers! Since it’s really first time we’ve experienced this level of parallelism, we put in a few things in place. Here’s what we did:
- Defined a way to coordinate our testing and deployments
- Set up a slightly more formalized code review process. The goal here is meant to improve our overall maintenance knowledge and quality of our product. That means that if there’s an issue in production with some feature and the person who wrote it isn’t online, whoever code reviewed it has a good idea of how it works.
- Made it easier than ever to create a ‘feature flip’ and A/B test. A Feature flip is a simple logic block which allows the team internally to test any new feature ahead of time. This makes it super easy to build and release features even though they’re not finished. We can easily turn a ‘feature flip’ into an A/B test and release to a subset of users.
- We’ve renewed our metrics/growth focus. We’ve built out our experiments dashboard. This makes it easy to track our key metrics and understand how a new feature or A/B test affects them.
- We launched 8 A/B experiments (homepage and feature experiments)
We’ve focused a great deal on improving various areas of our app in terms of vulnerabilities. This month, we worked (and continue to work) closely with Egor Homokov who is an expert in OAuth security. As our app is heavily reliant on OAuth this was critical. We’ve moved fast to fix some major vulnerabilities OAuth and XSS vulnerabilities. We’ve also created a security page with thanking all of the wonderful whitehat hackers who have helped make Buffer more secure!
One of the biggest challenges we’ve had in the past was to ensure that posts go out at the exact time they’re scheduled for. During peak times, we’re now hitting up to 4k posts per minute. This level of scale has introduced some interesting technical challenges we’ve noticed some level of posting delay during high load times. In January, we re-architected our scheduling flow so that we queue up posts ahead of time which allows us to send off posts at the exact time they were scheduled for. There is still some slight delay (< 60 seconds) at peak times (on the hour between 6-9am PST), but this is a significant improvement from before.
The other area we’re working hard to improve is overall downtime. On Tuesday January 28th, we had 35 minutes of downtime. This was caused by an unindexed database query that was executing as we were load testing a new feature (weekly email digests). This ended up creating several long running queries which eventually locked up our core database. This caused a bad experience for everyone using Buffer during that time and I want to sincerely apologize for that.
This gave us a chance to fully reflect on what happened. We created a full report of what happened and we wrote all of our learnings from the incident to try and ensure it doesn’t happen again.
Looking forward to February
It’s been so amazing to reflect back on what the engineering team focused on in January. I’m hoping we can continue this momentum into next month. In February, we’ll look to improve our testing framework and coverage, continue focusing on security and transparency! If you have any questions, just shoot me an email! We’ve got a ton to do and we’d love your help!