Solr: Morphline for ETL

What is morphline?

Morphlines is an open source framework that reduces the time and efforts necessary to build and change Hadoop ETL stream processing applications that extract, transform and load data into Apache Solr, HBase, HDFS, Enterprise Data Warehouses, or Analytic Online Dashboards. It replaces Java programming with simple configuration steps, and correspondingly reduces the cost and integration effort associated with developing and maintaining custom ETL projects.

History: The morphline commands were developed as part of Cloudera Search. Since the launch of Cloudera Search morphline development graduated into the Kite Software Development Kit (Kite SDK) in order to make the technology accessible to a wider range of users and products, beyond Search.

Where Morphine fits in?

Below is an example of pipeline from kitesdk.org.

Dataflow