The world of Hadoop – From the laymen’s point of view
Hadoop is the platform onto which you aggregate all your organization’s data and additional data from partners and marketplaces if you like. And the core Big Data management layer of Hadoop is the province of that sort of IT administrators in your organization; their job is to get the data on that, schedule applications and workflows that run on this distributed architecture, what kind of tools do they need, etc.
So, now that we have the infrastructure and we have put some data on it, now what? One of the things that people have started to recognize in the past couple of years is that there is another layer! This is the Big Analytics layer that is starting to become available to technologies and products that are available to move one’s skills into this Big Analytics arena sitting on top of Hadoop’s Big Data.
So, now we have data and then we have figured out some insights that have value in the organization; now what? The next step is you got to look at operationalizing that. Operationalization is taking those results and putting it into some kind of operational database, which can be the historical NoSQL thin or it could be an MPP database; increasingly it is a NoSQL database. Those databases that sort of sit behind customer facing web applications and that are sort of interfaces to traditional BI and visualization tools.
So, these are like macro-components in the Hadoop world. And finally the lower layer isn’t about the analysts. It is about the IT guys; and so data scientists should only care about the upper layer and take care to get access to it with these new tools and technologies by applying one’s skills through Big Analytics sitting on Hadoop.
A couple of years ago, people were able to get to their destination when using Hadoop but it wasn’t the most comfortable ride in the world because using Hadoop was really as hard as rocket science; people had to put a lot of efforts, in many cases read the source code to understand this thing called MapReduce. People got to where they wanted to be and what was encouraging was even with the kind of chunkiness of this thing they were getting very exciting results. But now things have changed for the better; and this is actually due to the efforts put in by the open source family and the Apache foundation and the commercial companies that were looking at the problems people have had and looking at the ways to make using it more comfortable, faster and enable to take existing skill sets and move them onto this thing called Hadoop.