Dowbload kafka from http://kafka.apache.org/downloads. Here I used version 2.11-0.10.1.0.
Extract the binaries
Set KAFKA_HOME and ZOOKEEPER_HOME in /etc/profile
Set environment variables for easy access
Run the following command to take the above environment variables in effect
Scenario: Single Broker, Single Producer and Single Consumer
Modify config. The config is used by kafka broker.
Start kafka broker
Open a separate window for Zookeeper client
Open another terminal
You should see no topics since we have not yet created any topics.
Now create a topic. You can create as many partitions as you want, but replication-factor must be less than or equal to number of brokers.
Describe the topics.
In the output Isr stands for "In Sync Replica".
Now let's look at the kafka logs
Open config/server.properties and look for log.dirs property. Default value is /tmp/kafka-logs
Check the log.dirs directory, assuming /tmp/kafka-logs is the log directory
For each partition of a topic, you will find one directory. For demo, since we created the topic with 2 partitions, you will see 2 folders. Let's go inside one of them.
Messages are stored in the .log file. Index are stored in .index file. Index file offset of the message for a given offset value.
Start a producer
Open a new terminal for producer
Note that after starting, the terminal will wait for your message followed by <enter>.
Start the consumer
Open a new terminal for consumer
On the producer window, type a message followed by enter. You should be able to view the message in the consumer terminal.
Switch to the terminal that opened the kafka-logs directory, and now view the directory content. Notice that .log file now has new data. Earlier it was a zero byte.
Check which process and user is accessing the .log file.
Collect the process id from the output of the above command, and view the process details.
Switch zookeeper terminal and explore consumer offset for topic consumer combination. Note, in your system consumer group id may be different.
Now, let's increase the number of partitions to 3. Kafka does not allow to decrease the number of partitions for topic.
Describe the partitions to view the new partitions created
Topic:demo PartitionCount:3 ReplicationFactor:1 Configs:
Topic: demo Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Topic: demo Partition: 1 Leader: 0 Replicas: 0 Isr: 0
Topic: demo Partition: 2 Leader: 0 Replicas: 0 Isr: 0
Verify that a new kafka-log directory has been created for demo topic. In total it should show 3 directory
Verify the log file size under each partition to indicate which log file contains data.
Only first partition log should have some data. The other 2 do not.
Switch to producer terminal, and publish a few new messages and check the log files. You might notice that the new messages will go to one of the partition and all new messages will move to the same partition.
Delete A Topic
By default kafka does not allow to delete. Enable delete at the server.properties file by setting delete.topic.enable=true. You have to restart the kafka-brokers to take the configuration change in effect.
When you delete a topic, it is marked for deletion and the flag is saved in Zookeeper. You can switch to Zookeeper terminal and view it.
If you want to undelete it. Makes
After you delete verify that the topic has been deleted and kafka logs are also deleted.
Find the list of consumer groups
Find the lags for by topic partition for a given consumer group
View the message from a kafka-log
Scenario 2: Single producer, Single Consumer and multiple Brokers
Start 8 terminals that we will refer in this doc by the following name.
Go to $KAFKA_HOME and copy config/server.properties to the following files.
Update 3 properties on each property file, as show below
Start the brokers on separate terminals.
Open Zookeeper Terminal and launch zookeeper cli
Examine the id, host and port for each broker id. It may indicate to you that connecting to Zookeeper can help you discover the Kafka cluster.
View the kafka logs
Create a new topic T2 with replication factor 3 and partition 1. To create the topic, you can use Shell terminal.
Describe the topic
View the kafka logs after creating the topics. It should reflect that new log directories have been created for the new topic T2
Start Kafka console producer in the Producer terminal.
Start a Kafka console consumer in the Consumer terminal.
Scenario 3: Single Broker, Multi-Consumer, Multi Broker
Goal: create 2 consumers under a common consumer group id.
Create a config file for consumer
Create a topic with multiple partitions.
Start the producer for the new topic
Start the first consumer with consumer config
Start second consumer for consumer group "console-consumer-group"
Now post a new few new messages. Observe that the message goes to one of the consumers, but not both. If you shutdown one consumer, the broker will reassign the partitions to the active consumer, so you should see the messages on the other consumer.
Save Kafka Topic To HDFS using Flume
Create flume configuration file called flume.conf
HDFS sink properties
Kafka source properties
Start flume agent
View the data saves in HDFS