The popular photo-sharing application, Instagram, came on the scene just two years ago and has managed to create quite a stir in the social network photo-sharing world. So popular was this Android app becoming, that it apparently ruffled more than the feathers of the social network photo-sharing giant Facebook. Facebook has gone as far as making an offer and purchasing Instagram (much to the annoyance of many).
With a mere two-year existence, how did Instagram manage to get so popular to the point where they became a threat to Facebook? During their two-year existence the Android app has managed to attract 40 million users. There are a few interesting lessons to learn from this new giant and the way they managed their application. How have they managed to serve so many users? How do they scale? And what technology did they use to achieve all they have achieved?
Now to answer those questions
Instagram took a proactive approach towards their application before it was launched and followed this up extensively once it had been launched. They did not just sit around and wait for things to happen before implementing corrective strategies. A few weeks before the app was launched, the team worked on the infrastructural aspects that would increase its capacity. They were engaged in capacity-planning and ensuring that all was in place for the launch, but once the launch was done they set about the task of quickly identifying problems that arose, and implementing measures to fix those problems as fast as they could. It was a challenge for the team to accomplish this, but with the help of particular tools and techniques they were able to get it done.
The technological approach and choices of the Instagram team is nothing short of pure forward thinking. They have chosen to use technologies that have already been proven. In their own blog they wrote it is best as a startup to:
1. Keep it simple.
2. Don’t re-invent the wheel.
3. Go with proven and solid technologies when you can.
Rather than building their own programs, the small team of 13 that included just three engineers made the decision to go with a few Amazon-based technologies plus many other technologies, and proven open source projects. The chosen technological programs perform various functions that give the app its high scalability rating. For example, take the program Statsd that the team used during the very early days following its launch.
Statsd – This is a network daemon that was written by Etsy. Its core function is to transfer data into graphs with the use two main types of statistics. These are counter and timers. Statsd uses the counters to track all number related activities that the app generated. These include the number of sign-ups it received per second and the number of likes and dislikes as indicated by the users. The timers were used to time major actions such as the generation feeds and the length of time it takes to follow users.
The team at Instagram found Statsd to be a realtime daemon that only had an approximate 10 seconds delay. They were able to add it to any new metric that they wanted to track. This helped the team to evaluate the app’s system and make immediate code changes when there was a need to do so.
By using technologies that were already proven to work well, the team was able to focus on sorting out the problems that came up rather than having to spend time building a program to take care of each problem that occurred. Other technological programs that were used and a synopsis of their function include:
Gearman: a task queue system used for the asynchronized sharing of photos to sites such as Twitter and Facebook. It would notify real-time Instagram subscribers that a new photo had been posted and feed fan-out. The original writer of this task queue system is Danga. The system allows the media uploads to finish quickly while it does the heavy lifting in the background. Instagram found the system to be a cost-effective solution for doing its push notification tasks.
Amazon S3: stores several terabytes of photos.
Munin: used for graph metrics across all their system. Also sends out alerts if there is anything outside its normal range. The team wrote many custom Munin plugins that worked well in advancing the function of Python-Munin which is to graph nonsystem-level metrics such as signups per second.
PagerDuty: handles notifications and incidents.
Sentry: used for Python error reporting. With Sentry the team could sign-on whenever they chose to and they would be able to see in real-time, the errors that were taking place across their system. This Django app was written by Disqus.
Instagram uses several technologies to cater to its data storage needs. Aspects of their data include their users, photo metadata and their tags among others. Technologies/applications used for data storage include the following:
PostgreSQL: runs data on 12 quadruple Extra-Large memory instances, 12 PostgreSQL replicas that run in different availability zone among other functions.
Vmtouch: especially used for managing data in that is in memory when failing over from machine to another.
Redis: powers their main feed and activity feed among other functions such as running in master-replica setup with replicas constantly saved to disk.
Pgbouncer : used for pool connections to PostgreSQL.