Twitter Kafka Spark Streaming
Start kafka server. Do not terminate the server process. Otherwise, your kafka will not function.
$ tar xf kafka_2.11-0.10.1.0.tgz
$ bin/kafka-server-start.sh config/server.properties
Create a topic named "twitter" with single partition and single replication factor.
$ cd Downloads/kafka_2.11-0.10.1.0
$ bin/kafka-topics.sh --zookeeper localhost:2181 --list
$ bin/kafka-topics.sh --zookeeper localhost:2181 --topic twitter --create --replication-factor 1 --partitions 1
Describe the topic that you just created.
$ bin/kafka-topics.sh --zookeeper localhost:2181 --topic twitter --describe
Start consumer (one separate window)
$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic twitter
Start producer (another separate window)
$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic twitter
Test the kafka set up by sending a message from producer. You should see the message in console consumer.
Now set up a spark to consume the messages from Spark
Clone the project below.
$ cd ~/workspace
$ git clone https://github.com/abulbasar/spark-scala-examples.git
Open Eclipse, and create scala project with name spark-scala-examples and run TwitterAnalyzer.scala.
Now, for every message from kafka producer should appear in Spark as well.
Now, use the following article to start streaming Twitter messages and publish the stream to the kafka topic "twitter".