Flume is an open source software program developed by Cloudera that acts as a
service for aggregating and moving large amounts of data around a Hadoop cluster
as the data is produced or shortly thereafter. Its primary use case is the gathering of log files from all the machines in a cluster to persist them in a centralized store such as HDFS.
In Flume, we create data flows by building up chains of logical nodes and
connecting them to sources and sinks. For example, say we wish to move data from
an Apache access log into HDFS. You create a source by tailing access log and use a logical node to route this to an HDFS sink.
Asked In: Many Interviews |