Kafka分布式集群搭建的示例，该示例搭建环境以Zookeeper集群搭建为基础，且依赖于Zookeeper集群，因此，若未搭建Zookeeper集群请先查阅：《分布式集群手操 – Zookeeper搭建》

准备

下载解压同理，此处不再重复说明。

配置

进入Kafka根目录下的config目录，配置server.properties，主要增加host.name和port配置项以及修改zookeeper.connect配置项，其中host.name为当前主机的主机名，port为9092端口，zookeeper.connect为所有主机的主机地址，中间用逗号隔开，注意一个broker.id配置项，它为当前主机在集群中的唯一标识Id，从0开始，必须保证每台主机的broker.id及host.name唯一。

配置文件如下：以hdfs1为例，其他节点需做适当修改

server.properties

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults
############################# Server Basics #############################
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0
############################# Socket Server Settings #############################
# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = security_protocol://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
#listeners=PLAINTEXT://:9092
# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092
port=9092
host.name=hdfs1
# The number of threads handling network requests
num.network.threads=3
# The number of threads doing disk I/O
num.io.threads=1
# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400
# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400
# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600
############################# Log Basics #############################
# A comma seperated list of directories under which to store log files
log.dirs=/home/bigdata/services/tmp/kafka
# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1
# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1
############################# Log Flush Policy #############################
# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.
# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000
# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000
############################# Log Retention Policy #############################
# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.
# The minimum age of a log file to be eligible for deletion
log.retention.hours=168
# A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
# segments don't drop below log.retention.bytes.
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000
############################# Zookeeper #############################
# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=hdfs1:2181,hdfs2:2181,hdfs3:2181
# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000

同时可修改zookeeper.properties文件的dataDir属性项：

zookeeper.properties

dataDir=/home/bigdata/services/tmp/kafka/zookeeper
# the port at which the clients will connect
clientPort=2181
# disable the per-ip limit on the number of connections since this is a non-production config
maxClientCnxns=0

完成server.properties的配置后，使用scp命令分发Kafka目录文件到其他节点主机，然后修改相应的属性保证唯一。

运行

在Kafka根目录下使用命令启动每台节点上的Kafka

1	`nohup ./bin/kafka-server-start.sh config/server.properties &`

启动成功后创建Topic进行测试，例如创建名为mytopic的Topic，然后在hdfs1上创建生产者，hdfs2和hdfs3上创建消费者。

创建topic

1	`./bin/kafka-topics.sh -zookeeper hdfs1:2181,hdfs2:2181,hdfs3:2181 -topic mytopic -replication-factor 1 -partitions 1 -create`

查看topic

1	`./bin/kafka-topics.sh -zookeeper hdfs1:2181,hdfs2:2181,hdfs3:2181 -list`

删除topic*

1	`./bin/kafka-topics.sh -zookeeper hdfs1:2181,hdfs2:2181,hdfs3:2181 -topic mytopic -delete`

创建生产者

1	`./bin/kafka-console-producer.sh -broker-list hdfs1:9092,hdfs2:9092,hdfs3:9092 -topic mytopic`

创建消费者

1	`./bin/kafka-console-consumer.sh -zookeeper hdfs1:2181,hdfs2:2181,hdfs3:2181 - from-begining -topic mytopic`

如果严格按照以上配置，测试是一次通过的，中间不会产生任何警告或错误。

删除Topic并不会立即生效，而是将其标记为待删除，或者可以直接在Zookeeper的文件目录中删除该Topic目录

big-data

#linux #distribution #big-data #kafka

分布式集群手操 – Kafka搭建

https://vicasong.github.io/big-data/kafka-distribute-install/

作者

Vica

发布于

2016年9月7日

许可协议

分布式集群手操 – Hive搭建上一篇

分布式集群手操 – Zookeeper搭建下一篇