Chances are good that you’ve heard of Hadoop. After all, it’s been all the buzz in the world of Big Data over the past several years and more and more businesses are inquiring how to get onboard with this technology. It was recently reported by the analyst firm IDC that the big data market will be worth $100 billion by 2020 and that half of this will be driven by Hadoop.
For those who need a quick review, essentially Hadoop is a storage system that takes in large amounts of data from servers and breaks it into smaller, manageable chunks. The technology is complex but at a high level the Hadoop ecosystem essentially takes a “divide and conquer” approach to processing Big Data instead of processing data in tables, as in a relational database like Oracle or MySQL.
A fairly simple way to help think of this is to imagine you have a machine with 4 hard drives that each process data at 100MB/s. At this rate, the machine would process a terabyte in 45 minutes. Alternatively, if that same terabyte of data is divided across 10 machines, each with 4 hard-drives, then processing time is cut to 4.5 minutes. Hadoop can be scaled from 2 nodes up to thousands of nodes, which equates to an order of magnitude gain in the processing of Big Data.
There are some out there who say you shouldn’t use Hadoop if your data isn’t big enough. Hadoop, they say is only necessary if you have a “Big Data” problem, in other words, if your data flow is at least 1 Terabyte or more. The implication is that most small businesses are not at this scale of data processing and don’t need to leverage Hadoop. Granted you’re not facing data streams the size of Facebook, but this way of thinking is wrong-headed; leveraging new technology as early as possible is always a good approach to stay ahead of competition and to keep innovative.
Though Hadoop is still in the early stages of its growth and it’s fairly complex to implement natively, there are a number of companies that are creating out-of-the-box solutions to make Hadoop much more user friendly. And for those who are still readily exploring use cases, there are plenty of industry-wide examples of companies using Hadoop to manage their Big Data challenges.
In this short series we’ll review some of the best platforms for helping your small business get the most out of Hadoop.
Hortonworks is a business computer software company based in Palo Alto, California, which focuses on the development and support of the open-source Apache Hadoop distribution. Hortonworks has become a market leader in the Hadoop space and has received accolades and high reviews for its industry leading initiatives to make enterprise level Hadoop accessible and scalable for businesses of all sizes. Hortonworks major platform is called Hortonworks Data Platform (HDP) and includes the Hadoop Distributed File System, MapReduce, Pig, Hive, HBase and Zookeeper and additional components that can all be leveraged to help business with storing, processing, and analyzing large volumes of data.
The Hortonworks ecosystem of open source tools and products are easily accessible and the firm provides an impressive array of training and certification options. Through its impressive partner program with players across the Big Data market, Hortonworks strives to make enterprise level Hadoop easily accessible to as many organizations as possible.
The best way to get started with the Hortonworks ecosystem of tools is through its sandbox. Anyone can download the HDP sandbox, which is provided as a self-contained virtual machine (preferably Virtualbox), and start playing with Hadoop in no time. Hortonworks provides a very useful series of tutorials to help you get started learning the basics of Hadoop (including Hive, Pig, and HCatalog) through sample data sets.
Hortonworks provides a great selection of Hadoop tools, learning resources, and support. It’s one of the most accessible ways for businesses of any size to get started with Hadoop. If you need some Big Data use cases to generate new ideas and inspiration, Hortonworks offers plenty of examples as well.
To be continued . . .