Twitter Kafka Spark Streaming

Start kafka server. Do not terminate the server process. Otherwise, your kafka will not function.

$ tar xf kafka_2.11-0.10.1.0.tgz

$ bin/kafka-server-start.sh config/server.properties

Create a topic named "twitter" with single partition and single replication factor.

$ cd Downloads/kafka_2.11-0.10.1.0

$ bin/kafka-topics.sh --zookeeper localhost:2181 --list

$ bin/kafka-topics.sh --zookeeper localhost:2181 --topic twitter --create --replication-factor 1 --partitions 1

Describe the topic that you just created.

$ bin/kafka-topics.sh --zookeeper localhost:2181 --topic twitter --describe

Start consumer (one separate window)

$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic twitter

Start producer (another separate window)

$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic twitter

Test the kafka set up by sending a message from producer. You should see the message in console consumer.

Now set up a spark to consume the messages from Spark

Clone the project below.

$ cd ~/workspace

$ git clone https://github.com/abulbasar/spark-scala-examples.git

Open Eclipse, and create scala project with name spark-scala-examples and run TwitterAnalyzer.scala.

Now, for every message from kafka producer should appear in Spark as well.

Now, use the following article to start streaming Twitter messages and publish the stream to the kafka topic "twitter".