Skip to product information
From Elastic Stack development to operation
From Elastic Stack development to operation
Description
Book Introduction
Is it really necessary to use open source technologies that don't fit well together to process data?
Tired of integrating and operating different open sources
Solutions for developers and operators! Now all you need is Elastic Stack technology!


Elastic Stack has gone beyond just a search engine to become a powerhouse in data processing systems.
This book systematically outlines how to leverage the Elastic Stack to maximize your company's survival in a rapidly changing world.


Want to build an enterprise big data pipeline to process your company's data? Want to build an in-house search engine for fast data retrieval? Want to process and store massive amounts of data, then gain insights with compelling visualizations? Want to pull data from multiple servers, integrate it, and then visualize trends or statistics? The Elastic Stack is the answer.
All you need to do is prepare a physical computer or virtual machine for practice, and this book will take care of the rest.
This book explains the essential knowledge you need to know to design, develop, and operate data-centric applications using the Elastic Stack, with specific examples.
  • You can preview some of the book's contents.
    Preview

index
Part 1 | Elastic Stack Overview

Chapter 1: What is Elastic Stack?
1.1 The Birth of Elasticsearch
1.2 Evolving into the Elastic Stack
1.3 Components of the Elastic Stack
1.3.1 Elasticsearch: A Distributed Search Engine
1.3.2 Kibana: A Visualization and Elasticsearch Management Tool
1.3.3 Logstash: A tool for collecting and cleaning events
1.3.4 Bits: A lightweight collection tool that runs on the edge.
1.3.5 Other Solutions
1.4 Uses of Elastic Stack
1.4.1 Specialized Search Engine
1.4.2 Log Integration Analysis
1.4.3 Security Event Analysis
1.4.4 Application Performance Analysis
1.4.5 Machine Learning
1.5 Elastic Stack as part of a big data platform
1.5.1 Integration with Kafka, an enterprise data bus
1.5.2 Integration with the Hadoop ecosystem
1.5.3 Integration with Relational Databases
1.6 Comparison with similar products
1.6.1 Similar Products to Elasticsearch
1.6.2 Similar products to Logstash/Beats
1.6.3 Similar Products to Kibana
1.6.4 Similar products in the Elastic Stack
1.7 Systematic documentation and active community support
1.8 Summary

Chapter 2: Configuring the Windows Practice Environment
2.1 Preparing for Installation
2.2 Installing Elasticsearch
2.2.1 Download Elasticsearch
2.2.2 Running Elasticsearch
2.3 Installing Kibana
2.3.1 Download Kibana
2.3.2 Running Kibana
2.4 Verifying Elastic Stack License
2.5 Download a specific version
2.6 Summary

Part 2 | Elastic Stack Components

Chapter 3 Elasticsearch Basics
3.1 Preparation
3.1.1 Elasticsearch Requests and Responses
3.1.2 Using the Kibana Console
3.1.3 Checking System Status
3.1.4 Loading sample data
3.2 Index and Documents
3.2.1 Document
3.2.2 Index
3.3 Document CRUD
3.3.1 Create/Check/Delete Index
3.3.2 Creating a Document
3.3.3 Reading Documents
3.3.4 Document Modification
3.3.5 Deleting a Document
3.4 Response Message
3.5 Bulk Data
3.6 Mapping
3.6.1 Dynamic Mapping
3.6.2 Explicit Mapping
3.6.3 Mapping Types
3.6.4 String processing using multi-fields
3.7 Index Template
3.7.1 Checking the Template
3.7.2 Template Settings
3.7.3 Template Priority
3.7.4 Dynamic Templates
3.8 Analyzer
3.8.1 Analyzer Configuration
3.8.2 Tokenizer
3.8.3 Filter
3.8.4 Custom Analyzer
3.9 Summary

Chapter 4 Elasticsearch: Search
4.1 Query Context and Filter Context
4.2 Query Strings and Query DSL
4.2.1 Query String
4.2.2 Query DSL
4.3 Similarity Score
4.3.1 Understanding the Score Algorithm (BM25)
4.3.2 IDF calculation
4.3.3 TF calculation
4.4 Query
4.4.1 Expert Queries and Term-Level Queries
4.4.2 Match Queries
4.4.3 Match Phrase Query
4.4.4 Term Queries
4.4.5 Querying Terms
4.4.6 Multi-Match Queries
4.4.7 Range Queries
4.4.8 Logical Queries
4.4.9 Pattern Search
4.5 Summary

Chapter 5 Elasticsearch: Aggregation
5.1 Request-response format of aggregation
5.2 Metric Aggregation
5.2.1 Finding the mean/median
5.2.2 Checking the number of unique values ​​in a field
5.2.3 Aggregation within search results
5.3 Bucket Aggregation
5.3.1 Histogram Aggregation
5.3.2 Range Aggregation
5.3.3 Term Aggregation
5.4 Combination of Aggregations
5.4.1 Bucket Aggregation and Metric Aggregation
5.4.2 Subbucket Aggregation 5.5 Pipeline Aggregation
5.5.1 Parent Aggregation 5.5.2 Sibling Aggregation 5.6 Summary

Chapter 6 Logstash
6.1 Introduction to Logstash
6.1.1 Logstash Features
6.2 Installing Logstash
6.2.1 JDK Installation
6.2.2 Download Logstash
6.2.3 Running Logstash
6.3 Pipeline
6.3.1 Input
6.3.2 Filter
6.3.3 Output
6.3.4 Codec
6.4 Multiple Pipelines
6.4.1 Writing Multiple Pipelines
6.5 Monitoring
6.5.1 How to Use the API
6.5.2 Activating the monitoring function
6.6 Summary

Chapter 7 Beats
7.1 Introducing Beats
7.2 Installing Beats
7.3 Filebeat
7.3.1 FileBeat Architecture
7.3.2 FileBeat Download
7.3.3 Running Filebeat
7.3.4 Filebeat Settings
7.3.5 Module 7.4 Monitoring
7.5 Other rain
Using Tsu
7.6 Summary

Chapter 8 Kibana
8.1 Introducing Kibana
8.1.1 Index Patterns
8.2 Discover
8.2.1 Query bar, filter bar, and time picker
8.3 Visualization
8.3.1 Bar Graph
8.3.2 Heatmap
8.3.3 Time Series Visual Builder (TSVB)
8.4 Dashboard
8.4.1 Creating a Dashboard
8.5 Canvas
8.5.1 Elasticsearch SQL
8.6 Maps 8.6.1 Windows Web Server Deployment
8.6.2 User
Applying custom tile maps and GeoJSON
8.6.3 Clusters and Grids
8.6.4 Applying GeoJSON
8.6.5 Tile Map Service
8.7 Summary

Part 3 | Practical Uses of the Elastic Stack

Chapter 9: Creating an Index Using Kaggle CSV Files
9.1 How the practice is conducted
9.2 Preparing for the Lab
9.3 Importing CSV Files
9.4 Importing Files Using Kibana Data Visualizer
9.4.1 Data Discover
9.5 Importing Files Using Logstash
9.5.1 Reading CSV Files
9.5.2 Parsing as date/time type
9.5.3 Parsing with Ruby Filters
9.6 Index Mapping
9.6.1 Storing Logstash Data Using Index Mapping
9.6.2 Saving Logstash Data Using a Template
9.7 A quick analysis in Kibana
9.7.1 What film genre has been produced the most in the last 10 years?
9.7.2 How profitable were Hollywood films with high ratings? How much did their budgets cost?
9.8 Summary

Chapter 10: Analyzing Korean Twitter Data Using Logstash
10.1 How the practice is conducted
10.2 Preparing for the Lab
10.2.1 Twitter Developer Information
10.3 Running Logstash
10.4 Search and Korean Morphological Analyzer
10.5 Reindexing
10.6 Real-time data visualization
10.7 Summary

Chapter 11: Analyzing Public Data Using Python Clients
11.1 How to proceed with the lab 11.2 Preparation for the lab
11.2.1 Obtaining a Seoul Open Data Authentication Key
11.2.2 Installing Python 3.8
11.2.3 Installing QGIS
11.2.4 Added map visualization types
11.3 Analyzing population by administrative district in Seoul
11.3.1 Importing Public Data into Logstash
11.3.2 Creating User Vector Layers Using QGIS
11.3.3 Which areas in Seoul have the most single-person households?
11.3.4 Where in Seoul are there many foreigners?
11.4 Creating a Seoul Public Wi-Fi Map
11.4.1 Retrieving public data using open APIs
11.4.2 Running Open APIs Using Python
11.4.3 Python Elasticsearch Client
11.4.4 Python Client App
11.4.5 Indexing using the bulk API
11.4.6 Visualizing Seoul Public Wi-Fi Locations 11.5 Summary, Part 4 | Elastic Operations

Chapter 12: Configuring a Linux Practice Environment
12.1 Installing Ubuntu in VirtualBox
12.2 Installing Elasticsearch
12.2.1 Installation using wget
12.2.2 Installation using the Linux Package Manager
12.3 Installing Kibana 12.3.1 Installing using wget
12.3.2 Installation using the Linux Package Manager 12.4 Summary

Chapter 13 Cluster and Node Configuration
13.1 HTTP Layer and Transport Layer
13.2 Node
13.2.1 Master Node
13.2.2 Data Node
13.2.3 Ingest Node
13.2.4 Other nodes
13.2.5 Dedicated Node
13.3 Node Configuration and System Configuration
13.3.1 Small Clusters
13.3.2 Large Clusters
13.3.3 Hot/Warm/Cold Node Configuration
13.4 Cluster Backup
13.4.1 Registering a Repository
13.4.2 Taking a Snapshot
13.4.3 Restoring a Snapshot
13.5 shards
13.5.1 Primary and Replica Shards
13.5.2 Shard Allocation Process
13.5.3 Shard Health Monitoring
13.6 Shard Count and Size Configuration Guide
13.6.1 Shard Count Guide
13.6.2 Shard Size Guide
13.7 Settings
13.7.1 Cluster Configuration
13.7.2 Node Settings 13.7.3 Index Settings
13.8 Summary

Chapter 14 Building an Operational Cluster
14.1 Production Cluster Considerations for Performance and Stability
14.1.1 Node Configuration Planning
14.1.2 Hardware Selection
14.1.3 Selecting an Elasticsearch Version
14.2 Cluster Configuration
14.2.1 Node Installation
14.2.2 Development Mode and Operation Mode
14.2.3 Setting the operating mode
14.2.4 Verifying Execution and Configuration
14.3 Security feature settings
14.3.1 Creating a Certificate
14.3.2 Encrypting Inter-Node Communication
14.3.3 HTTP Client Communication Encryption
14.3.4 Starting the Cluster and Configuring Built-in Users
14.3.5 Encrypting Communication Between Kibana and Elasticsearch
14.3.6 Encrypting Communication Between Kibana and the Browser
14.4 User Registration and Management
14.4.1 Defining User Roles
14.4.2 Adding Users and Assigning Roles
14.5 Summary

Detailed image
Detailed Image 1

Publisher's Review
What this book covers

- Introduction to core usage of Elasticsearch, Logstash, Beats, and Kibana
- A concrete project case that demonstrates how to properly utilize the characteristics of the Elastic Stack - A practical summary of the points to keep in mind during operation.

Features of this book

- Rich examples that anyone who wants to quickly become familiar with the Elastic Stack can easily follow.
- How to use search engines, as well as how to build a big data pipeline
- Monitoring the Elastic Stack and visualizing the collected data in various ways
- The entire process, from actual data collection to visualization, is implemented using only components of the Elastic Stack.
- Application examples of Korean morphological analyzers
- Examples of map visualization using public data and GIS data
- Introduction to cluster/shard configuration and security enhancement methods required for the operation process.

Target audience for this book

- Field developers who must perform data processing, search, transformation, analysis, and visualization tasks.
- Server operators who must ensure high availability, stability, and security of big data.
- Architects who want to design or implement a big data platform using a single open source technology.
- Elastic Stack technologies: Developers or operators interested in Elastic Search, Logstash, Beats, and Kibana.

Structure of this book

In Part 1, "Elastic Stack Overview," we'll explore the history, purpose, and components of the Elastic Stack, and learn how to install Elasticsearch and Kibana.
Chapter 1, "What is the Elastic Stack," covers the history of Elastic, its components, how to utilize specialized search services and log monitoring services using the Elastic Stack, and the status and role of the Elastic Stack in big data platforms.
And we will look at the differences between Elastic Stack and other solutions.
In Chapter 2, 'Configuring the Windows Lab Environment', we will learn how to install Elasticsearch and Kibana version 7.10.1 in a Windows environment.

Part 2, 'Elastic Stack Components', takes a closer look at the components of the Elastic Stack: Elasticsearch, Kibana, Logstash, and Beats.
In Chapter 3, 'Elasticsearch Basics', we will learn about Elasticsearch indexes and documents, and understand document CRUD and index structure.
You will also learn how to store documents in Elasticsearch while learning about mappings, index templates, analyzers, and more.
In Chapter 4, 'ElasticSearch: Search', we will learn about the BM25 algorithm used in index queries, the differences between full-text and term-level queries, and search methods by directly executing representative queries.
In Chapter 5, "Elasticsearch: Aggregation," you'll learn about metric aggregation for obtaining statistical information, bucket aggregation for dividing documents, and pipeline aggregation using multiple aggregations.
In Chapter 6, 'Logstash', you will learn how to install Logstash and JDK, how to write pipelines, how to use plugins, etc.
We will also look at Logstash monitoring.
In Chapter 7, 'Beats', you will learn how to install Beats and the steps to install and run Beats using FileBeat.
We will also learn how to easily set up and monitor Beats using modules.
In Chapter 8, 'Kibana', we will learn how to use the visualization menus: Discover, Visualization, Dashboard, Canvas, and Maps.

In Part 3, 'Practical Applications of Elastic Stack,' we will proceed with several projects that can be implemented with Elastic Stack based on what we learned in Parts 1 and 2.
In Chapter 9, "Building an Index Using Kaggle CSV Files," we will upload CSV movie files downloaded from Kaggle to Elastic Stack and then analyze the data.
In this process, data is purified using Logstash Ruby filters, etc., and an index is created using mapping and index templates.
In Chapter 10, "Analyzing Korean Twitter Data with Logstash," you will learn how to obtain social data using the Logstash Twitter plugin and how to analyze Korean data with the Nori analyzer.
We'll cover how to redesign your index and visualize real-time data in Kibana.
Chapter 11, "Analysis of Public Data Using Python Clients," covers how to develop Elasticsearch client apps using Python.
In this course, you will learn how to use public APIs and how to create and visualize vector layers that can be used on maps using QGIS.

In Part 4, 'Elastic Operations', you will learn how to install the Elastic Stack in a Linux environment and about the nodes, shards, indices, and their settings, which are the basics of a cluster.
We will also practice configuring a cluster with three actual nodes.
In Chapter 12, "Configuring the Linux Lab Environment," we will learn how to install Elasticsearch and Kibana version 7.10.1 in a Linux environment.
Chapter 13, "Cluster and Node Configuration," explains the role of nodes and provides a guide to cluster configuration.
We will also learn how to configure hot/warm and how to back up, and explore shards and shard optimization.
Finally, you will learn how to set up nodes and clusters.
In Chapter 14, "Building an Operational Cluster," you'll learn how to select hardware for operations, configure a cluster, and enable security features.
Finally, we will look at how to distinguish the user roles required for operation.

Development environment for using this book

Elasticsearch 7.10.1
- Kibana 7.10.1
- Logstash 7.10.1
- Filebeat 7.10.1
- Windows 10 (used in parts 2 and 3)
- Linux Ubuntu 18.04 (used in Part 4)
- JDK 8
- Python 3.8

[Author's Note]

I remember when we first introduced Git at our company in 2008.
Our department, which was a Linux development team, decided to introduce Git, a new configuration management tool created by Linus Torvalds (the founder of Linux), instead of the paid configuration management tool we had been using.
However, at that time, Git was in its early stages and had few user-friendly features, and everyone was unfamiliar with how to use it, making it difficult to use.
With the product release just around the corner, if there was any tangled commits between developers, I had to stay up all night to sort out the Git history.
Looking back now, I think the problem was that I only learned the usage provided in the tutorial and applied it directly to real-world situations without understanding the basic concepts of Git, such as remote storage and staging areas.
It was difficult to use the tools properly and difficult to resolve problems when they occurred.

Since then, I have used many frameworks, libraries, and programs, but I always felt a sense of frustration in my heart.
There was also the difficulty of reading development documentation written in English, and a culture where time spent on anything other than writing code was considered a waste of time.
Because I didn't fully understand the framework and was using it in a haphazard manner, I felt like I wasn't able to utilize its functions to their fullest potential.
Sometimes when I ran into problems or had trouble writing code, I sought help from Google and Stack Overflow.
I felt grateful for the codes I found on the Internet and the people who kindly answered my questions, but at the same time, I felt disappointed in myself, wondering, "Why was it so hard for me to write code?" and "Why couldn't I apply it?"
Of course, because I didn't have basic knowledge, my understanding and application skills couldn't be good.

This was the point I focused on the most while writing this book.
Through this book, I wanted to teach readers the fundamentals and fundamentals of a framework called Elastic Stack.
I hoped that readers would acquire at least a basic understanding of the framework they were using and begin writing code.
The important parts have been explained repeatedly to ensure complete understanding, and less important parts have been boldly omitted due to space limitations.
In case there are any parts that require further explanation, I have provided links to online documentation.
Most of the explanations in this book are available in the official documentation or Elastic blog, but it would be better for readers to first understand the big picture through this book rather than learning it piecemeal from the internet.
Understanding the basics or skeleton of a framework will make it easier to search smarter, understand someone's code and explanations, or questions and answers, and remember them for a long time.

Also, another point that I paid attention to while writing the book was how it differed from existing elastic books.
When I first planned the book, I wanted to write a book on utilizing the Elastic Stack, which was completely unavailable on the market.
However, in accordance with the publisher's opinion that we should focus more on the basics, we changed the book to cover the explanation, use, and overall operation of the Elastic Stack.
Instead, unlike existing books that only cover Elasticsearch in depth, this book has reduced the amount of Elasticsearch and increased the amount of the overall Elastic stack, including Logstash, Beats, and Kibana.
And if you can't explain everything, you can change it by leaving out the details and instead emphasizing the fundamentals and even showing use cases.
Finally, I'd like to thank Shai Bannon, the creator of Elastic, the countless contributors who have made Elastic better, and, going deeper, the developers who created Lucene, distributed systems, and other efforts that have shaped the computing landscape we live in today.

- Kim Jun-young

When I first encountered the Elastic Stack (then called the ELK Stack) to build a log monitoring system, it was a bit of a mess and confusion.
At that time, Elasticsearch was understood as a tool that could perform fast full-text searches through indexing, or as something that stored reliable data in a relational database and additionally helped with indexing.
But after digging a little deeper, I realized that it was possible to store the original data and increase reliability by configuring replicas, and that it could be used like a database.
Fast search and aggregation performance seemed like a breakthrough for the performance issues of the monitoring solution I was developing at the time, and after using Logstash and Kibana myself, I realized that I would be better off leveraging the Elastic Stack rather than implementing it myself with other technologies.
Seeing how the Elastic Stack, which I had simply built, could easily handle logs of a level I had never imagined before, I had a gut feeling that the era of the Elastic Stack had truly arrived.

Building a data platform requires implementing diverse functions, including data collection, refinement, storage, search, and visualization. Furthermore, it's necessary to research and integrate various software solutions, taking into account performance, flexibility, and other factors.
Elastic Stack, now a well-known technology, has risen to the status of a truly all-weather software stack that can constitute a complete data platform in itself, boasting all of these elements, as well as ease of deployment, ease of use, scalability, excellent performance, high fault tolerance, and high availability.

However, as it contains many features and is easy to deploy, a lot of basic knowledge about the overall data platform, clustering, distributed processing, indexing, etc. is required to use it more properly.
If you use it without a deep understanding, you may encounter unexpected problems in terms of performance, security, or other operational aspects.
Moreover, the rapid updates and additional features that come with them, which are both advantages and disadvantages of the Elastic Stack, can make it confusing for users to know which features to use and where to start.

I have tried to provide accurate and misleading information about the structure and operating principles of the Elastic Stack in this book.
Although it may not cover all the features of the vast Elastic Stack, I wanted to highlight the most essential ones so that readers can experience a wider range of Elastic Stack features.
Furthermore, I wanted to highlight the appeal of each product within the stack, each perfectly fulfilling its role as a stack, rather than just the high-performance search engine Elasticsearch and its supporting software.
Not only does the Elastic Stack have a very active domestic community, but there are also many open source contributions from Korean users.
I hope this book fully conveys the charm of Elastic, and furthermore, I hope it will increase the number of contributors to the Elastic community and open source.
- Jeong Sang-un

[Editor's Note]

For the purpose of review, I read the entire text of this book, from the first draft to the proofreading, over and over again. Each time, I had the fascinating experience of seeing the different core functions and uses of the Elastic Stack clearly come to mind.
Usually, if you read a technical book multiple times, it gets a bit boring, but with this book, I felt like I was learning something important every time I read it.
Developers who have never encountered Elastic before, as well as those who think they are somewhat familiar with it, will find themselves empathizing with the reviewer's words when they pick up this book.


Instead of scattering important and unimportant information throughout the book to explain everything comprehensively, this book focuses on the 20% that can achieve 80% of the effect, strictly following the Pareto principle.
Therefore, the sense of speed and cohesion in the development of the text is quite high.
Moreover, it goes beyond the descriptions of APIs and functions in the catalog, and provides excellent examples to help you quickly learn the core concepts, and throughout the book, it transforms abstract theories into tangible, concrete examples.
For those who are tired of collecting fragmented knowledge from the Internet while wondering when, where, and how to best utilize the Elastic Stack, this book will be an oasis that will quench your thirst.

As mentioned in the main text, the Elastic Stack is not particularly difficult to use, from installation to actual use, so you might think you can just start using it right away.
However, just because it is easy to get started doesn't mean it is easy to master.
The Elastic Stack provides powerful capabilities for building entire big data pipelines, so there's a lot to learn, and active learning—learning while you use it and learning while you use it—is essential.
One of the key features of this book, which is very helpful in the active learning process, is its goal-oriented explanations.
Instead of unconditionally explaining that there are various functions without context, it clearly explains which Elastic functions should be used to do various tasks.


For example, when there are two similar functions, you need to know the exact differences and specific usage methods to achieve your desired purpose in terms of function or non-function (performance and storage space saving). This book explains the intention and core of each and presents the appropriate choice for the situation through comparison.
It also delves into the internal workings of issues that must be considered beyond the development process (essential and difficult to compromise) during actual operation, using rich illustrations and scenarios to clearly explain the reasons for doing so and specific guidelines.

The hidden protagonists of this book are the three projects in the Elastic Stack utilization section.
When working on a big data-related project, you might have wondered how to complete everything from data collection to visualization at once with only the Elastic Stack, instead of using various open sources and connecting them in a complex manner. This book shows the entire process of visualizing it on a map while neatly processing real data from Kaggle/Twitter/public data, not just examples for the sake of examples.


This course covers everything from how to connect individual components that make up the Elastic Stack to utilizing filters for smooth connection, to the programming required for the actual business logic implementation process. Therefore, I believe it is suitable as a use case or practical example for properly utilizing the Elastic Stack.
This book, filled with carefully selected content based on the authors' extensive practical experience, is intended to serve as a lever for readers to actively utilize the rich features and powerful performance of the Elastic Stack in solving real-world problems.
GOODS SPECIFICS
- Publication date: August 19, 2021
- Page count, weight, size: 592 pages | 185*240*29mm
- ISBN13: 9791189909321
- ISBN10: 1189909324

You may also like

카테고리