Hadoop – The past and present
Hadoop is a great solution for Big Data. But why has it got such weird name and what does it actually mean? The story of the project getting the name Hadoop goes like this: long back when the creator of Hadoop Doug Cutting’s 3-year-old son was playing with his yellow elephant toy he named it Hadoop. Doug found the name quite interesting and it had no meanings attached to it. He noted the name at that time itself and when this project sprung up he named it Hadoop. Doug Cutting says that his son who is now a teenager is very proud of his feat and wants his share of copyrights as well!
Till 2005-06 Yahoo was having distributed clusters as well and their own computing framework dreadnought which was struggling when the nodes in the cluster increased. They appointed Doug with a dedicated team to run their services which were web scaled and decided to adopt a framework in 2005. In August 2010 Doug has moved to Cloudera; towards the end of 2013 Apache released a stable of Hadoop 2 which is also known as Yarn, which is the short form of Yet Another Social Negotiator, also known as next generation Hadoop.
All Hadoop versions were launched as open source and were managed by Apache which works extensively on open source projects. We just like to add that, Apache is a volunteer run organization and runs a number of open sources projects, Hadoop being one of them. it was in 2005-06 that the Hadoop initial versions were launched but it was only in 2013 when Hadoop 2 was launched was Hadoop 1 named; before that they were just 0.xx releases. The modification in Hadoop 2, also known as Yarn, makes it closer to the original Google MapReduce paper.
So, as of 2014 Google is ahead of the world in terms of design of a technical framework as compared to the rest of the world as they were the first to start it and all the others have been following them to this point. They have just released their papers in 2004-05 which talks about their design at a high level but of course they never released the actually working designs. Doug and Mike were the first to see and implement the framework for distributed computing which could be used for the lot more than just web searches, which is crawling and indexing.
Yahoo is one of the largest companies to have a Hadoop cluster and which makes open source contributions to Hadoop and so do FaceBook and Twitter. Yahoo contributed a lot to Hadoop, Pig being one of the major contributions; likewise FaceBook contributed Hive and Twitter contributed Storm. The business advantage these companies seek by making these contributions is that it becomes easier for them to find trained resources in that technology if it is out there in the market; or else it would be very difficult for them if they keep the tools in-house. On top of it, they get branding benefits and they contribute to the society as well.
So, it is obvious that it was Google’s research paper and Doug and Mike’s effort to realize the potential of the framework for other situations too lead to the birth of Hadoop and the Big Data problems in IT were first handled by search engines.