The concept of pre-computation in Hadoop
The most classic example of pre-computation is the Google pre-suggest where Google provides suggestions and auto-completes the words and sentences as user types in the search term. And this is presumably the result of large-scale data analysis – what is the most likely search query given the first few characters. Almost everybody has experienced it, it is fast and it is very useful; and the basic idea is to do Big Data analysis where the data is put in a NoSQL database or a look-up database and the result is something that is quite useful.
Another example that explains this concept of pre-computation in depth is the idea of biometric identity scanning. For your information a background on biometrics: this basically deals with finger prints and there are three things we do with fingerprints:
- Enrollment: Here we gather the fingerprints from various sources and put it in a database which is then associated with the person’s name and other necessary personal information.
- Verification: Here people go to the concerned authority and identify themselves and provide credentials, which is then compared with the information available with the database which is then confirmed by algorithmic way to match the prints and the person.
- Identification: And finally there comes the identification where a person is identified through biometrics.
So, now we have a database full of various fingerprints that belong to a number of different people. So, the question comes up – maybe there are multiple identities like if a person has three passports which mean some are fake. So how to verify whether the identities are genuine or not? Doing that verification at low latency using low-level architecture is very difficult. Now, what happen if we take a shot at it uses the NoSQL and MapReduce tools?
What was actually done was introduction of the concept of pre-computation; here the system scans the fingerprints available with all the fingerprint databases and analyzes the data in such a way that it creates a sub-set of prints and provides “suggestion” or close matches ahead of time, very quickly, from which the right one can be selected; this can effectively narrow down the search base. So, through this concept of pre-computation, rather than have to scan through all the fingerprints available with all the databases, which can go up to a few billions, we can find our match by scanning through a few thousands and that too within a reasonable amount of time. Additionally the mathematics behind doing all this is abatable and hence can be evolved over time.
Hence, this concept of pre-computation can be tried quickly and at scale, leading to breakthrough analytics and secondly we can build applications faster and through mash-ups we can also make them more intuitive.