Introducing Hadoop into one’s organization
Here are some important factors to consider when dealing with Hadoop:
- Plan: There were people who jumped into this thinking it to be two feet and found it to be ten. So it is important to plan, but at the same time don’t boil the ocean. Don’t take all your data and think we have to create a massive data. Just find some meaningful set of data make it a short project and look for help because you don’t have to do it all yourself. There are a number of resources out there to make things easy for you.
And there isn’t just wisdom in crowds there is wisdom in the cloud as well; so what companies like Amazon have done is taken their EC3 and S2 infrastructure and have created an on-demand service. So you don’t have to go and store the infrastructure and make any investment; you can spin up a Hadoop cluster and spend 10 cents on to several hours of cycles on Hadoop. There are customers today who have 5-6yrs of weblogs on S3. With the elastic MapReduce service, what they can do is do large scale analytics with products on top of that with data in-situ very easily.
- Democratize: In early technology environments what you see is really smart people getting exciting since they are the people who understand the technology. They really are the rocket scientists and we owe them a debt of thanks. But if we are going to move access to data to the people who are really going to need to do things with it and extract value from it you have to find ways to connect the data professionals, data scientists to it. Make sure that early on you have programmers and their job is to get the infrastructure in place and get the data on it and also to give you access to it.
- Equip the right people with the right tools: The current situation is that the right tools are coming to emerge so that one doesn’t have to worry with the dollar and hash prompts and command interfaces, SSH interfaces, and crude rudimentary interfaces. And these tools are connecting existing skill sets with the data and provide additional capabilities that allow you to go faster.
- Begin now: This doesn’t require any waiting around; others aren’t waiting around. People have extracted much from Hadoop and have optimized their businesses today to death. The one value they have not looked into is the value that is locked in their datasets. So businesses who are recognizing this are running faster. And if you haven’t started this yet, be sure your competitors have!
- Once you have begun don’t stop: Once you got a meaningful set of results from the short project that you might have started to test the usefulness of Hadoop looks at how to get to the next stage; keep planning and plan how to go faster.