May we suggest...


Scaling Buffer in 2013


Since joining Buffer last September, it’s been so amazing to see how much we’ve grown in just under a year.  This is the first time any of us on the team has built something that’s achieved this level of scale and in doing so, we’ve learned so much.  I want to share a general overview of Buffer’s scale, and the technology stack that we’ve built to fulfill it.

Some Quick Stats about Buffer’s Scale

  • 2 backend engineers, Colin and Sunil work on web/API (Really 1 and a half, because now my day-to-day is now primarily focused on hiring and Android)
  • 3 million daily API requests (33 requests per second)
  • 9 million daily ‘Buffer buttons‘ served (100 requests per second)
  • 2.6 million daily application web requests, (ie serving the Buffer popup and pages)
  • 30k Daily active users and 147k Monthly active users
  • 5500 API clients
  • 1.6 Million social media updates posted per week


The Tech

The front-end of Buffer is built with the Backbone.js MVC framework and our backend is written in Codeigniter (PHP) and Django (Python).  Our servers are all fairly standard Linux with Apache webservers.  Everything is hosted on Amazon Web Services. With less than 2 full time backend engineers, it makes the most sense to not re-invent anything that’s already been done and instead, focus on developing roadmap features. AWS allows us to do everything we’ve wanted without spending too much time focusing on ops or hiring an infrastructure engineer.  We use Elastic Beanstalk (with EC2 instance and Elastic Load Balancers) to easily configure, deploy, and scale all of our services,  Route53 for DNS management, ElastiCache for our Memcache configuration, and Simple Queue Service (quite heavily) for all of our event/message handling and processing. All of our static assets, and user uploaded photos are stored on S3 with Cloudfront as our CDN.


Buffer Buttons (Widgets)

We serve our ‘Buffer buttons’ (the button in the floating bar on the left of this post) from 2-4 Apache2 (m1.small) servers  that scale up as connections increase.  Since the Buttons are served from several third-party websites performance has been extremely important.  Button counts are Memcached and refreshed every 15 minutes. Memcache allows us to serve Buttons faster and reduces reads on our database.  There’s very little work going on to display these buttons, which allows us to serve them at scale with just a few small servers.


Buffer API and Web servers

The Buffer API is the backbone of our application and houses most of our logic.  Our web app, mobile clients and partners are all dependent on it, so we’ve built the API focusing on on both speed and availability.  The API handles everything from authentication, posting/adding to Buffer, image uploads and all other user actions.  We scale between 6-15 m1.small servers based on average CPU load and number of connections.  One way we try to keep response times low is by delaying major logic or high latency segments behind queues (using Amazon SQS) and process events with workers.  For example API response times when adding to a user’s Buffer were slowed down because we were making an HTTP request to Pusher.  We’re were able to reduce response times after external request behind a message-queue system and processing these requests with workers.


Buffer Workers

As mentioned, we use Amazon’s Simple Queue Service heavily to manage and process messages.  We use this to handle processing the sending of posts to FB, Twitter etc, receiving analytics from services, processing links and internal metrics, and sending push notifications and emails.  With SQS, we’re able to do passive retries if a failure in processing occurs.  Our workers are run by 10-15 m1.small servers and each worker is run as a daemon that is managed by supervisord. Our application workers are written in PHP and metrics workers in python.


Our Datastore

Since our core technical team is super small, we really don’t have time to fully manage our MongoDB database configuration. This is why we use MongoHQ.  Our experience with them has been great.  While we are constantly thinking about the optimal set up, and our application’s query schema, it’s been great knowing that they’re our devops/db management team.  We often shoot them emails that trigger discussions when we’re thinking of making a configuration change, or setting up a new query.  Here’s our configuration we’ve setup with them:  The Buffer application is run on a 3 member replica set.  They’re on m2.4xlarge EC2 instances, each with 68 gb of RAM and with 2000 Provisioned IOPS.  Two members of the set are the usual primary/secondary set up which allows for high availability.   We allow reads from this secondary to load balance the queries across the two servers.  The last member is a secondary that is held at priority 0 so that it never become primary.  We use this server for our internal queries and administration and therefore we don’t want to run production queries on this member.   While developing, every new query is run and tested manually and ensured it’s optimized, (often by creating a query index) before using it in production.  We also memcache queries which enables us to both reduce load from our DB and provide a faster response.


Buffer Metrics

We’ve built out our own metrics tools to measure the usage of Buffer.  Every link clicked, API request, visitor and pre-conversion page visit is tracked.  We used to use third-party solutions to track this, but as Buffer has grown, we’ve realized that building our own solution was inevitable.  Our custom metrics allows us to store and query raw metrics in ways that are most useful to us.  We’ve essentially built our own Google Analytics, and now we have control over the data. We use Python for processing metrics events.  They’re stored in a separate MongoDB database managed by MongoHQ.  We run a 2-member and 1 arbiter replica set that has 500gb of SSD storage.  SSD is extremely important for us here as it allows us to run unindexed queries on our data that we may have never initially planned for, thus allowing us to slice and query our data in various ways at much faster speeds than hard disk.  Using MongoDB, we have a lot flexibility in how we structure our data, and can change it at any time.  Our internal metrics application was built by Michelle using Django.


Over the next few months I’ll be going in depth describing our stack and provide much more detail into the challenges that we’ve faced and thought process for architecting Buffer.  I’d love to hear from you if you have any thoughts at all about our tech, and what you’d like me to detail further in coming blog posts.

  • Christopher Egner

    Thanks for sharing details about your architecture! Can’t wait to hear more!

  • It’s interesting that you almost exclusively use m1.small servers.
    My understanding was that bigger instances were better

    ie. (for your workers) 2 x m1.xlarge would be more effective than 16 x m1.small

    but you seem to have the opposite conclusion,
    did you do benchmarks on this?

    • sunils34

      Hi Matthew! I’m so glad you brought this up as it got me thinking more about how suboptimal the worker set up is!

      We use more small servers on our web and api environments because we’re connection and bandwidth limited, and not limited by CPU or memory from m1.small servers. It’s also for reliability/high availiability that we prefer to load balance across several servers.

      I completely agree however that our workers are more so CPU/Memory bound and it makes more sense to use larger instances. For HA purposes, we’re still going to want multiple servers, as our workers are critical (we’d prefer to over provision a bit more here). I’m going to do some more experimenting with numbers to see what makes sense! I’ll report back my findings. Thanks for the great question here!

    • Eduard Bulai

      Matthew, if I may, when scaling an app/service you have to adjust to the growing rate as well. If it were to use xlarge instances from start then in the beginning the instance would be underused therefore money thrown off the window. This way, aside from HA, you pay as you grow. When the growing rate will rise, so will the instance size.

      • I’m more worried about the cost of doing a round robin load balance across many small machines, versus a few bigger machines.

        This was the whole problem with the RapGenius issues on Heroku earlier this year.

  • Martin Henk

    Thanks for sharing. Everything else makes a lot of sense, but why use Apache over something like Nginx? Don’t you have performance/scaling issues with Apache?

    • sunils34

      Hi Martin! Thanks for sparking this discussion! I personally love Nginx and think it’s great! We may end up switching down the road as it’s easier to configure, but (as mentioned in the other comment above) this will be done when we have more resources to spare. As of now we actually haven’t had any significant performance/scaling issues with Apache for us to prioritize the switch.

  • Abhinav A

    Completely agree with you re: SQS – its a lifesaver.
    +1 for nginx – the gunicorn+nginx combination is absolutely amazing for running Django/Flask apps. May be worth your time to give it a look.

    • sunils34

      Hi Abhinav!

      Thanks for commenting! I definitely agree. I love nginx and had a similar gunicorn+nginx setup in my previous startup. I do think that we may end up switching to nginx down the road as it’s much easier to configure than apache. The main reason we’re on apache is because AWS elastic beanstalk has been a very solid option for us to scale and have enough control over what’s going on. Right now Elastic Beanstalk seems to work well with Apache for us (the default webserver). Down the road, as we have more resources to spare, we’ll be looking at ways to boost performance (in terms of both development/configuration and response times). I’d love to hear more about the setup that you have!

  • That was fascinating to read Sunil. Always been interested in the complete tech stack at Buffer! Great overview.

  • Kailash

    Sunil, a question on SQS. In my experience with it, there’s a very high chance of getting the same Q msg again, so there’s a necessity of having a local data store to mark which of the messages have already been processed. Do you do this? Or do you make sure your operations are idempotent?

  • hdra

    Awesome read!
    p.s.: RSS for the blog please, 🙂

  • Fernando Segura

    its great to read this kind of things, why nobody have tried to submit this content in highscalability, nevermind i will do it. Thanks Sunil for posting this kind of info, I read both of your blogs the business side and the architectual design, they both are fascinating. keep up the good work and keep sharing the knowlege.

80,000+ social media marketers trust Buffer

See all case studies