Data Engines: Flume

Flume is a best data in motion tool , which has a good of ease of use . The configuration file acts as the only source of UI which tells the flume system end to end execution of flume system with different questions of how , when and where of a streaming data . No coding knowledge is required for basic execution of flume .This makes non programmers look at flume for handling streaming data . How ever the basic flume and Hadoop conceptual knowledge is required .

Below are the types of elements in Flume :

Flume has Agent as a unit of data transfer , Agent is a component which will have below three elements in it . It will make a complete small set up for data movement.

1) Source
2) Channel
3) Sink

Source , Channel and Sink are binded together to form an agent.

We can have multiple agents either connected sequentially in a pipeline fashion or concurrently to pull more data .

Types of channels:

1) Memory Channel
2) JDBC Channel
3) File Channel
4) Kafka Channel

Types of Sinks :

1) Kafka Sink
2) HBase Sink
3) HDFS Sink
4) Solr Sink
5) Hive Sink
6) Elastic Search Sink

Types of Sources :

1) Kafka Source
2) Avro Source
3) Exec Source
4)Taildir Source
5)JMS Source
6)Scribe Source
7)ThriftLegacy Source
8)syslogudp
9)syslogtcp
10)seq
11)netcat
12)multiport_syslogtcp

Sample flume Agent:

a1.channels = c1
a1.sources = r1
a1.sinks = k1

a1.channels.c1.type = memory

a1.sources.r1.channels = c1
a1.sources.r1.type = avro
# For using a thrift source set the following instead of the above line.
# a1.source.r1.type = thrift
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 41414

a1.sinks.k1.channel = c1
a1.sinks.k1.type = logger

Flume

No comments:

Training