by zhirayr | Jul 18, 2012
This is a guest post by the team behind Xaffo, a social media monitoring and analytics service, we covered in our blog previously. This post is a part of the Startup League series, where we cover topics for those who want to start their online business or have just started one.
Calcey Technologies has developed Xaffo.com. It is a social media analytics tool designed to provide social media intelligence about how an organization’s web pages are being shared and discussed on leading social media networks. During the early days of Xaffo’s development, it became pretty obvious that performance and scalability would be important considerations for its architecture, as well as for its detailed design and deployment. Performance management challenges are a cornerstone to tackle in order to provide Xaffo as a scalable, seamless and simple to use social media monitoring tool.
Because of Xaffo’s focus on monitoring web pages across social networks, the core functionality of Xaffo involved reading sitemaps and passing each and every URL in these sitemaps to several different social network APIs. It also involved obtaining raw activity statistics about each URL and saving them in a database for subsequent aggregation and graphical analysis.
The Task: Ensure Data-Gathering Performance
By making assumptions about the data structure, one could do a crude calculation to determine the approximate number of “data gathering operations” to be performed per day by Xaffo:
For the above scenario, the formula for the approximate number of API calls scheduled in the daily data gathering operation is 1,000 X 2 X 1,000 X 7 X 2 = 28,000,000 (28 million). The individual API calls to the social networks happen across the pipe, and delays in Internet access speed can clearly affect the efficiency of the process. And then there was the issue of failsafe API returns. Sometimes the API requests to the different social networks failed to return a tangible result. A separate process had to be invoked to retry each request that failed, up to three times. All in all, we had to ensure that the daily data gathering process concluded within an 18-hour time period, taking into account the above assumptions.
Turning to a Scalable Provider to, In Turn, Ensure Scalability
The first key infrastructure decision we made was to span the processing on Google AppEngine’s cloud servers. AppEngine facilitates automated scalability of hosted apps; as your application grows, Google will basically adjust the amount of hardware resources to keep the system running. The users do not have to worry about the security either; Google simply takes care of it.
Using AppEngine’s data store enabled us to reduce the data processing performance overhead that would otherwise have been severe if we had used SQL queries for data aggregation and analysis. In addition, we picked Python as our programming language, as it promised rapid development and extensibility.
We refactored our design as we progressed in development, to cater to performance needs based on empirical trial. One of the first issues we encountered was that AppEngine was unable to handle the daily data-gathering task beyond 500 URLs – it would simply timeout. So we developed a task queue, where URLs were fed into the task in batches of 500. This facilitated the uninterrupted continuity of the data gathering process.
Adjustments on the Fly
Another issue we faced was that we initially used the XML reading mechanism called DOM (Document Object Model) to retrieve individual URLs from the sitemaps. With time we observed that the DOM was slowing down the overall process and sometimes caused instability at runtime. Using DOM also prevented us from stopping the URL retrieval process midway. After some research, we moved the XML reading mechanism to SAX (Simple API for XML). This API allowed us to stop the XML reading process at any point that we wished to, and was very stable at runtime.
A third problem we encountered was that we used cron jobs to manage tasks. Every time a cron job ended, AppEngine continued to keep the CPU instance alive for a further 15 minutes prior to killing it. This caused unnecessary financial expenditure. We resolved this problem by introducing a master cron job to manage all other cron jobs.
As a result, we now have a high-performance and scalable solution in Xaffo that will track the performance of an organization’s web pages across leading social media networks. The solution captures key social media analytics data points for important social sites like Facebook and Twitter, among many others. Xaffo easily tracks the top 100 performing web pages per domain with graphical displays of information alongside data-centric mouse-over insights and more.
We have adopted a living architecture and design for Xaffo. As any successful SaaS-based architecture might expect to do, we continue to make occasional design tweaks based on long-term observation and user feedback. To learn more, visit www.xaffo.com.