Impala is a SQL query system for Hadoop from Cloudera. The Cloudera positions Impala as a "real-time" query engine for Hadoop and by "real-time" they imply that
rather than running batch oriented jobs like with MapReduce, we can get much faster query results for a certain types of queries using Impala over an SQL based front-end.
It does not rely on the MapReduce infrastructure of Hadoop, instead Impala implements a completely separate engine for processing queries. So this engine is a specialized distributed query engine that is similar to what you can find in some of the commercial pattern related databases. So in essence it bypasses MapReduce.
Asked In: Many Interviews |