Skip to product information
The powerhouse of Kafka data platforms
Kafka, the powerhouse of data platforms
Description
Book Introduction
Everything you need to know about Kafka, a highly available, real-time distributed streaming solution for event-driven asynchronous architectures that's gaining traction as a core component of data platforms!

This book is packed with the authors' extensive knowledge, practical experience, and know-how, having operated the "company-wide Kafka" service and built data pipelines and big data platforms at Kakao, Korea's largest mobile platform company!

This course is aimed at Kafka users as well as companies and managers who have not yet adopted Kafka. It covers how to easily install and configure Kafka, asynchronous message processing, and program development methods step by step.
Additionally, we delve into Kafka's internal design and producer/consumer characteristics to facilitate easy adoption of a messaging system.
In addition, it provides detailed explanations of various examples of how to analyze real-time data by utilizing Kafka as a data bus, and provides sample code along with detailed explanations of the process of developing a Kafka-based real-time data analysis system, so it will be helpful to big data analysts or data engineers performing real-time/batch analysis work, and all developers in charge of developing event-based asynchronous systems.
  • You can preview some of the book's contents.
    Preview

index
Part 1: Starting with Kafka
Chapter 1: What is Kafka?
1.1 Background of Kafka's birth
1.2 How Kafka works and its principles
1.3 Kafka Features
1.4 Kafka's Expansion and Development
1.5 Summary

Chapter 2: Installing Kafka and ZooKeeper
2.1 ZooKeeper for Kafka Management
2.2 Installing Zookeeper
__2.2.1 Zookeeper Download
__2.2.2 Running Zookeeper
2.3 Installing Kafka
__2.3.1 Download Kafka
__2.3.2 Kafka Configuration
__2.3.3 Running Kafka
2.4 Checking Kafka Status
__2.4.1 Check TCP port
__2.4.2 Checking Kafka information using ZooKeeper Node
__2.4.3 Check Kafka logs
2.5 Getting Started with Kafka
2.6 Summary

Part 2: Basic Concepts and Operation Guide
Chapter 3: Kafka Design
3.1 Kafka Design Features
__3.1.1 Distributed Systems
__3.1.2 Page Cache
__3.1.3 Batch Processing
3.2 Kafka Data Model
__3.2.1 Understanding the topic
__3.2.2 Understanding Partitions
__3.2.3 Offset and Message Order
3.3 Kafka High Availability and Replication
__3.3.1 Replication Factor and the Role of Leaders and Followers
__3.3.2 Managing Leaders and Followers
3.4 If all brokers are down
3.5 ZooKeeper Node Role Used in Kafka
3.6 Summary

Chapter 4: Kafka Producer
4.1 Sending a message to the console producer
4.2 Producer using Java and Python
__4.2.1 Send a message and do not confirm it
__4.2.2 Synchronous transmission
__4.2.3 Asynchronous transfer
4.3 Producer Usage Examples
4.4 Producer Main Options
4.5 How to send a message
__4.5.1 When fast transmission is required but there is a high possibility of message loss
__4.5.1 When there is a low possibility of message loss and transmission at a reasonable speed is required.
__4.5.2 When transmission speed is slow but message loss must not occur
4.6 Summary

Chapter 5: Kafka Consumer
5.1 Key Consumer Options
5.2 Retrieving Messages with the Console Consumer
5.3 Consumers using Java and Python
5.4 Partitions and Message Order
__5.4.1 peter-01 topic and message order with 3 partitions
__5.4.2 peter-02 topic and message order with one partition
5.5 Consumer Group
5.6 Commit and Offset
__5.6.1 Auto Commit
__5.6.2 Manual Commit
__5.6.3 Assigning a specific partition
__5.6.4 Retrieving messages from a specific offset
5.7 Summary

Chapter 6: Kafka Operations Guide
6.1 Essential Kafka Commands
__6.1.1 Creating a Topic
__6.1.2 Check the topic list
__6.1.3 Topic Details
__6.1.4 Changing topic settings
__6.1.5 Changing the number of partitions in a topic
__6.1.6 Changing the replication factor of a topic
__6.1.7 Check the consumer group list
__6.1.8 Checking consumer status and offset
6.2 Zookeeper Scale Out
6.3 Kafka Scaling Out
6.4 Kafka Monitoring
__6.4.1 How to configure Kafka JMX
__6.4.2 JMX Monitoring Metrics
6.5 Kafka Manager
__6.5.1 Installing the Kafka Manager
__6.5.2 Registering a Kafka Cluster
__6.5.3 Kafka Manager Menu Description
6.6 Q&A on Kafka Operations
6.7 Summary

Part 3: Extensions and Applications of Kafka
Chapter 7: Building a Data Pipeline Using Kafka
7.1 Data Flow Diagram Using Kafka
7.2 Sending messages using Filebeat
__7.2.1 Installing Filebeat
__7.2.2 Filebeat settings
__7.2.3 Checking message inflow into Kafka topic
7.3 Retrieving messages using NiFi
__7.3.1 Installing NiFi
__7.3.2 NiFi Settings
__7.3.3 Setting up a consumer using NiFi
7.4 Storing messages in Elasticsearch for real-time analysis
__7.4.1 Installing Elasticsearch
__7.4.2 Elasticsearch Configuration
__7.4.3 Sending data to Elasticsearch using NiFi
7.5 Checking data stored in Elasticsearch using Kibana
__7.5.1 Installing Kibana
__7.5.2 Kibana Configuration
7.6 Reproducing messages from the current topic to a new topic
__7.6.1 Adding a Kafka Consumer Using NiFi
__7.6.2 Topic-specific routing using NiFi
7.7 Summary

Chapter 8 Kafka Streams API
8.1 Stream Processing Basics
__8.1.1 Stream Processing and Batch Processing
__8.1.2 Stateful and Stateless Stream Processing
8.2 Kafka Streams
__8.2.1 Features and Concepts of Kafka Streams
__8.2.2 Kafka Streams Architecture
8.3 Configuring Kafka Streams
8.4 Creating a Pipe Example Program
8.5 Creating a Row Splitting Example Program
8.6 Creating a Word Frequency Counting Example Program
8.7 Summary

Chapter 9: Streaming Processing with Kafka SQL
9.1 Background of KSQL's Emergence
9.2 KSQL and Kappa Architecture
9.3 KSQL Architecture
__9.3.1 KSQL Server
__9.3.2 KSQL Client
9.4 Installing a KSQL Cluster Using Docker
9.5 Stream Analysis with KSQL
__9.5.1 Data Preparation
__9.5.2 Creating Basic Streams and Tables
__9.5.3 Creating new streams and tables using queries
9.6 Summary

Chapter 10: Other Cloud-Based Messaging Services
10.1 Introducing Google's Pub/Sub Service
10.2 Google's Pub/Sub Service Integration
__10.2.1 Installing the Google SDK
__10.2.2 Using Topics with Google Pub/Sub CLI
10.3 Using the Pub/Sub Python SDK
__10.3.1 Installing the Pub/Sub Python Library
__10.3.2 Create Google Service Account Credentials
__10.3.3 Using the Python SDK
10.4 Introducing Amazon Kinesis Service
10.5 Amazon Kinesis Integration
__10.5.1 Installing the Amazon CLI
__10.5.2 Using Kinesis with the Amazon CLI
10.6 Using the Amazon Kinesis Java SDK
__10.6.1 Consumer Code Example
__10.6.2 Producer Code Example
10.7 Comparison of Kafka and Cloud Services
10.8 Summary

Appendix Installing Kafka Using Docker
A.1 Installing Docker
__A.1.1 Installing Docker on Linux
__A.1.2 Installing Docker on Mac
__A.1.3 Installing Docker on Windows
A.2 Installing Kafka using Docker

Detailed image
Detailed Image 1

Publisher's Review
The most significant characteristic of modern computing architecture is that it is a 'loosely coupled' computing architecture.
As the cloud era begins in earnest, computing resources are no longer permanent.
It can disappear without you knowing, or it can suddenly increase several times or even dozens of times through auto-scale.
Therefore, the components that make up a service cannot be 'strongly coupled' as before.
To give a simple example, in modern computing environments, if you stick to the existing server/client structure and communicate directly, the server you are communicating with may disappear at any time. Therefore, recent computing communication takes the form of exchanging data indirectly based on an asynchronous messaging framework instead of direct communication.


Another feature is the centralization of data within the company.
In previous generations, each service in a company operated a separate data pipeline (ETL: Extract, Transform, Load) layer or data analysis system, which made the entire company's data fragmented and comprehensive data analysis very difficult.
In previous services, a certain level of user satisfaction could be achieved by analyzing only the data of each service.
But in today's world where services are highly connected, things have changed dramatically.
For example, to analyze customers using a specific company's messenger service, you need to analyze not only the customer's messenger service usage information but also the customer's usage data on blogging, social networking, and photo services used by the same company. This will allow you to obtain customer service data that will elicit user satisfaction.


In the past, it was difficult because there was no service bus system that could withstand the load when collecting all the events generated from many services, but recently, with the introduction of Kafka, which is also an event bus application based on high throughput, fast horizontal scalability, and fault tolerance, a satisfactory level of analysis technology has become possible.
Accordingly, a growing number of large companies focusing on data analysis are adopting Kafka and using it as a crucial central data pipeline.

Kafka is a traditional messaging system that is used not only for messaging processing, but also as a pipeline for tracking user website activity and for aggregating application statistics to use as monitoring data.
And in the past, when data was needed, the time taken for data analysis increased and efficiency decreased because the work of requesting and waiting for the data from the organization in charge of managing the data was repeated countless times, but by using Kafka, the data analysis environment is rapidly changing so that events that occur in chronological order, such as event sourcing, are stored in a data bus called Kafka, so that the necessary organizations or personnel can immediately utilize this data whenever they need it.


Apache Kafka is an application developed to handle asynchronous communication on a very large scale and very quickly.
Since its release in early 2011, it has been adopted and used as an asynchronous dedicated framework by numerous companies that require in-depth real-time analysis of user data, such as Netflix, LinkedIn, Airbnb, Microsoft, Uber, Kakao, and Line.


As more and more companies are adopting Kafka as a core platform within their enterprises, going beyond a simple messaging queue service, I hope this book will be of great help to those who want to understand Kafka and build asynchronous systems and data pipelines using Kafka.


[Structure of this book]
Part 1: Beginning with Kafka
In Chapter 1, Introduction to Kafka, we will examine the history of Kafka by examining the changes in data processing systems through the situation at LinkedIn when Kafka was born.
We will also take a closer look at the basic operation of messaging systems and the features of Kafka.

In 'Chapter 2, Installing Kafka', we will learn more about ZooKeeper, a reliable coordination application for Kafka, a representative distributed application, and examine the relationship between ZooKeeper and Kafka.
This book explains the ZooKeeper installation process, which is the most difficult for users because it is not covered in detail in any book, with pictures and code, and explains in detail step by step the process of installing, running, and checking the status of Kafka.


Part 2: Basic Concepts and Operational Guide
Chapter 3, Kafka Design covers the characteristics of Kafka, including distributed systems, page cache, and batch processing, along with the concept of Kafka replication, which is characterized by high performance and high availability, and the roles of leaders and followers.
We will also learn about essential terms used in Kafka and learn about ZooKeeper's gnodes used in Kafka.

In 'Chapter 4, Kafka Producer', we will learn about the main options of the producer, run example code of the producer using the console producer, Java and Python languages, and learn about synchronous and asynchronous message transmission methods.
We'll also take a closer look at how to transmit messages without loss, depending on the producer's options.

In 'Chapter 5, Kafka Consumer', we will look at the main options of the consumer, implement a simple consumer, and discuss precautions when retrieving messages according to the number of partitions, as well as automatic commit, manual commit, and offset.

In 'Chapter 6, Kafka Operation Guide', we will take a detailed look at the essential Kafka commands that you must know when operating Kafka and how to monitor Kafka. We will also learn how to install and use Kafka Manager, a GUI management tool that makes managing Kafka easier.
We also compiled frequently asked questions and answers about running Kafka.

Part 3: Extensions and Applications of Kafka
In 'Chapter 7, Using Kafka', you will learn how to send, retrieve, store, and check messages using real-world examples using Elasticsearch, Kibana, Filebeat, and other popular tools. You will also learn an example of configuring a data pipeline using Apache NiFi.

In 'Chapter 8 Kafka Streams API', you will learn the concept of stream processing and how to perform real-time analytics using Kafka without a separate streaming engine such as Spark or Storm, with practical examples.
Additionally, we will create programs that use the Kafka Streams API to pipe, split rows, and count word frequencies.

In Chapter 9, Stream Processing with KSQL, we will learn more about the background of KSQL and its architecture, and examine stream analysis methods using KSQL, which enable various analyses of real-time streaming data with simple KSQL queries without developing a separate app.

In Chapter 10, Other Cloud-Based Messaging Services, we'll explore the overview, integration, and usage of each service, allowing companies that find it difficult to use Kafka or operate their own messaging services to leverage cloud-based messaging services as an alternative to Kafka.
Finally, we compare Kafka with other cloud-based messaging services.

In 'Appendix, Installing Kafka using Docker', we will look at the installation method of Kafka using Docker, which is recently popular, for Linux, Mac, and Windows versions, as a way to use Kafka without installing it directly on the server.

[For whom this book is intended]
- From beginners who want to learn Kafka to administrators who directly operate Kafka
-Developers concerned about data standardization and real-time processing
-Developers who want to utilize data processing using event sourcing
-Developers who want to collect, process, and analyze data efficiently
Architects and developers who build real-time data pipelines and develop applications.
- All developers responsible for developing event-driven asynchronous systems

[Contents and Features of this Book]
-Kafka's birth background and operating principles
- Detailed instructions for installing ZooKeeper and Kafka and configuring the cluster.
- Example code and usage of Kafka producers and consumers using Java and Python
- Description of key commands required for Kafka operation
- Guide to adding nodes and scaling out of ZooKeeper and Kafka
-Installation and use of Kafka Manager, a convenient GUI tool
-Major Kafka architecture explained with easy-to-understand diagrams
A Complete Guide to Real-Time Stream Analytics Using Kafka
- Example of configuring a data pipeline using Elasticsearch and Apache NiFi

[Author's Note]
Several years ago, Kakao began using Kafka as a data pipeline for each service department.
Then, a proposal came to the infrastructure team I belonged to to operate Kafka for a specific service, and 'by fate' I readily accepted the offer and began to study and self-study Kafka in earnest.
But at the time, the things I could find on the Internet were all very simple, and most of them were just installation instructions or quick-start guides.
As I learned about Kafka one by one and applied it to my work, I became completely captivated by the charm of high-performance Kafka.
Rather than simply operating Kafka within the team, I suddenly had the idea of ​​operating a 'company-wide shared Kafka' service that would integrate Kafka distributed across departments within the company. With the support and encouragement of many people around me, I quickly put this idea into action.

While studying Kafka on my own and working in the field, I started thinking that it would be good to share the knowledge and experience I had accumulated with other developers. Then, at the recommendation of a colleague, I started contributing articles on Kafka to the knowledge sharing site Popit (popit.kr), and this led to the writing of a book.
- Go Seung-beom (Peter)

If I had to pick the most important part of modern computing architecture design, it would be the data pipeline.
Even if it's not a popular cloud platform, it's crucial to gather data from various virtual resources and user activity information (such as number of clicks, dwell time, shopping cart, purchase information, and purchase time) generated by the services provided by these resources, process it appropriately, and then provide users with appropriate information.

One of the biggest challenges I've faced over the years of building data platforms has been creating a software platform that can handle massive amounts of data with incredible agility and is fault-tolerant.
It's like creating a kind of spine in pipelining.
I had previously created a solution for this purpose using only ZooKeeper, but the performance was not up to my expectations, so I was worried. However, after learning about the open source solution called Kafka, creating this data pipeline became much easier.
I wrote this book to share my experiences from that time with my readers.

Most devices that make decisions using algorithms, whether artificial intelligence or self-driving cars, have a flow that follows a series of processes: gathering external information (Sense), analyzing this information to make a decision (Plan), and applying the decision (Act).
This book is specifically focused on the Sense-Plan section, which allows you to collect data and do some analysis.
If you're curious about the Plan-Act aspect of data analysis in the cloud, I recommend reading my book, "Big Data Analysis Using Cloud APIs" (Acorn Publishing, 2015).
- Andrew Gong
GOODS SPECIFICS
- Date of publication: April 26, 2018
- Page count, weight, size: 432 pages | 778g | 180*235*21mm
- ISBN13: 9791196203726
- ISBN10: 1196203725

You may also like

카테고리