分布式集群手操 – Kafka搭建

Kafka分布式集群搭建的示例,该示例搭建环境以Zookeeper集群搭建为基础,且依赖于Zookeeper集群,因此,若未搭建Zookeeper集群请先查阅:《分布式集群手操 – Zookeeper搭建》

准备

下载解压同理,此处不再重复说明。

配置

进入Kafka根目录下的config目录,配置server.properties,主要增加host.nameport配置项以及修改zookeeper.connect配置项,其中host.name为当前主机的主机名,port9092端口,zookeeper.connect为所有主机的主机地址,中间用逗号隔开,注意一个broker.id配置项,它为当前主机在集群中的唯一标识Id,从0开始,必须保证每台主机的broker.idhost.name唯一。

配置文件如下:以hdfs1为例,其他节点需做适当修改

server.properties
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults
############################# Server Basics #############################
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0
############################# Socket Server Settings #############################
# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = security_protocol://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
#listeners=PLAINTEXT://:9092
# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured. Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092
port=9092
host.name=hdfs1
# The number of threads handling network requests
num.network.threads=3
# The number of threads doing disk I/O
num.io.threads=1
# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400
# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400
# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600
############################# Log Basics #############################
# A comma seperated list of directories under which to store log files
log.dirs=/home/bigdata/services/tmp/kafka
# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1
# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1
############################# Log Flush Policy #############################
# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
# 1. Durability: Unflushed data may be lost if you are not using replication.
# 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
# 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.
# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000
# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000
############################# Log Retention Policy #############################
# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.
# The minimum age of a log file to be eligible for deletion
log.retention.hours=168
# A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
# segments don't drop below log.retention.bytes.
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000
############################# Zookeeper #############################
# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=hdfs1:2181,hdfs2:2181,hdfs3:2181
# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000

同时可修改zookeeper.properties文件的dataDir属性项:

zookeeper.properties
1
2
3
4
5
dataDir=/home/bigdata/services/tmp/kafka/zookeeper
# the port at which the clients will connect
clientPort=2181
# disable the per-ip limit on the number of connections since this is a non-production config
maxClientCnxns=0

完成server.properties的配置后,使用scp命令分发Kafka目录文件到其他节点主机,然后修改相应的属性保证唯一。

运行

在Kafka根目录下使用命令启动每台节点上的Kafka

1
nohup ./bin/kafka-server-start.sh config/server.properties &

启动成功后创建Topic进行测试,例如创建名为mytopic的Topic,然后在hdfs1上创建生产者,hdfs2hdfs3上创建消费者。

  • 创建topic
1
./bin/kafka-topics.sh -zookeeper hdfs1:2181,hdfs2:2181,hdfs3:2181 -topic mytopic -replication-factor 1 -partitions 1 -create
  • 查看topic
1
./bin/kafka-topics.sh -zookeeper hdfs1:2181,hdfs2:2181,hdfs3:2181 -list
  • 删除topic*
1
./bin/kafka-topics.sh -zookeeper hdfs1:2181,hdfs2:2181,hdfs3:2181 -topic mytopic -delete
  • 创建生产者
1
./bin/kafka-console-producer.sh -broker-list hdfs1:9092,hdfs2:9092,hdfs3:9092 -topic mytopic
  • 创建消费者
1
./bin/kafka-console-consumer.sh -zookeeper hdfs1:2181,hdfs2:2181,hdfs3:2181 - from-begining  -topic mytopic

如果严格按照以上配置,测试是一次通过的,中间不会产生任何警告或错误。

删除Topic并不会立即生效,而是将其标记为待删除,或者可以直接在Zookeeper的文件目录中删除该Topic目录


分布式集群手操 – Kafka搭建
https://vicasong.github.io/big-data/kafka-distribute-install/
作者
Vica
发布于
2016年9月7日
许可协议