Skip to product information
Understanding Deep Learning Properly
Understanding Deep Learning Properly
Description
Book Introduction
Beyond complex models, it's time to delve into the essence of deep learning.

This book intuitively unravels the core of complex deep learning technology while maintaining a balance between theory and practice.
First, we will explain the basic concepts that support deep learning step by step.
After covering the basics of supervised learning and neural network architecture, model training, and optimization, we will examine representative models for image, text, and graph data: CNNs, transformers, and graph neural networks.
Next, we cover generative models such as GANs, VAEs, and diffusion models, as well as reinforcement learning, and finally, we discuss the theoretical reasons for the effectiveness of deep learning and examine ethical issues.
This will be the most solid starting point for all readers who want to properly understand deep learning.
  • You can preview some of the book's contents.
    Preview

index
Translator's Preface xiii
Beta Reader Review xiv
Beginning with xvii
Acknowledgements xix

CHAPTER 01 Introduction 1
1.1 Supervised Learning 2
1.2 Unsupervised Learning 8
1.3 Reinforcement Learning 12
1.4 Ethics 14
1.5 Structure of this book 17
1.6 Recommended Books 18
1.7 How to Read This Book 19
_References 21

CHAPTER 02 Supervised Learning 23
2.1 Overview of Supervised Learning 24
2.2 Linear Regression Example 25
2.3 Summary 30
_Note 30
_Practice Problem 31

CHAPTER 03 Shallow Neural Networks 33
3.1 Example 33 of a Neural Network
3.2 Universal Approximation Theorem 37
3.3 Multivariate Input and Output 38
3.4 Shallow Neural Networks: General Case 43
3.5 Terminology 44
3.6 Summary 45
_Note 46
Practice Problem 50
_Reference 53

CHAPTER 04 Deep Neural Networks 55
4.1 Neural Network Combination 55
4.2 Building a Deep Neural Network through Network Combination 58
4.3 Deep Neural Networks 59
4.4 Matrix Notation 63
4.5 Shallow Neural Networks vs.
Deep Neural Networks 65
4.6 Summary 67
_Note 68
Practice Problem 71
_References 74

CHAPTER 05 Loss Function 75
5.1 Maximum likelihood 76
5.2 How to construct a loss function 80
5.3 Example 1: Univariate Regression Analysis 80
5.4 Example 2: Binary Classification 86
5.5 Example 3: Multiclass Classification 88
5.6 Multiple Outputs 91
5.7 Cross-entropy loss 92
5.8 Summary 94
_Note 95
Practice Problem 97
_References 101

CHAPTER 06 MODEL FIT 103
6.1 Gradient Descent 103
6.2 Stochastic Gradient Descent 110
6.3 Momentum 113
6.4 Adaptive Moment Estimation 115
6.5 Training Algorithm Hyperparameters 118
6.6 Summary 119
_Note 120
Practice Problem 124
_References 127

CHAPTER 07 Slope and Initialization 129
7.1 Problem Definition 129
7.2 Differential Calculus 131
7.3 Simple Example 133
7.4 Backpropagation Algorithm 137
7.5 Parameter Initialization 143
7.6 Training Code Example 147
7.7 Summary 149
_Note 149
Practice Problem 153
_References 157

CHAPTER 08 PERFORMANCE MEASUREMENT 159
8.1 Training a Simple Model 159
8.2 Causes of Error 161
8.3 Reducing Errors 166
8.4 Double Descent 170
8.5 Hyperparameter Selection 174
8.6 Summary 175
_Note 176
Practice Problem 181
_References 183

CHAPTER 09 Regularization 185
9.1 Explicit regularization 185
9.2 Implicit Regularization 189
9.3 Empirical Methods for Improving Performance 192
9.4 Summary 202
_Note 203
Practice Problem 212
_References 214

CHAPTER 10: Convolutional Networks 219
10.1 Invariance and Isovariance 220
10.2 Convolutional Networks for One-Dimensional Inputs 221
10.3 Convolutional Networks for Two-Dimensional Inputs 229
10.4 Downsampling and Upsampling 230
10.5 Application 233
10.6 Summary 239
_Note 240
Practice Problem 246
_References 249

CHAPTER 11: Residual Neural Networks 253
11.1 Sequential Processing 253
11.2 Residual Linkage and Residual Block 256
11.3 Gradient Explosion in Residual Neural Networks 260
11.4 Batch Normalization 262
11.5 General Residual Neural Networks 264
11.6 Why Neural Networks with Residual Connections Perform Better 271
11.7 Summary 272
_Note 272
Practice Problem 280
_References 282

CHAPTER 12 TRANSFORMERS 285
12.1 Text Data Processing 285
12.2 Dot product self-attention 286
12.3 Dot-product self-attention extension 292
12.4 Transformer Layer 295
12.5 Transformers for Natural Language Processing 296
12.6 Example of an Encoder Model: BERT 300
12.7 Example Decoder Model: GPT-3 303
12.8 Example of an Encoder-Decoder Model: Machine Translation 308
12.9 Transformer 310 for long sequence processing
12.10 Transformers for Image Processing 311
12.11 Summary 316
_Note 316
Practice Problem 328
_References 330

CHAPTER 13: Graph Neural Networks 337
13.1 What is a Graph? 337
13.2 Graph Representation 340
13.3 Graph Neural Networks, Tasks, and Loss Functions 344
13.4 Graph Convolutional Networks 346
13.5 Graph Classification Example 349
13.6 Inductive Model vs.
Transitional Model 350
13.7 Node Classification Example 352
13.8 Graph Convolutional Network Layer 355
13.9 Edge Graph 359
13.10 Summary 360
_Note 361
Practice Problem 370
_References 373

CHAPTER 14 Unsupervised Learning 377
14.1 Unsupervised Learning Model Classification 378
14.2 Characteristics of a Good Generative Model 380
14.3 Performance Quantification 381
14.4 Summary 384
_Note 384
_References 386

CHAPTER 15: Generative Adversarial Networks 387
15.1 Using Discriminant as a Signal 387
15.2 Stability Improvements 393
15.3 Gradual increase, mini-batch discrimination, truncation 399
15.4 Conditional Generation 402
15.5 Image Conversion 405
15.6 StyleGAN 410
15.7 Summary 412
_Note 413
Practice Problem 419
_References 421

CHAPTER 16 Normalization Flow 427
16.1 One-Dimensional Example 427
16.2 General Case 430
16.3 Invertible Neural Network Layers 433
16.4 Multi-scale flow 442
16.5 Application 443
16.6 Summary 447
_Note 448
Practice Problem 453
_References 456

CHAPTER 17 Variational Autoencoders 461
17.1 Latent Variable Model 461
17.2 Nonlinear Latent Variable Models 463
17.3 Training 465
17.4 ELBO Properties 468
17.5 Variational Approximation 470
17.6 Variational Autoencoders 471
17.7 Reparameterization Techniques 474
17.8 Application 475
17.9 Summary 480
_Note 481
Practice Problem 486
_References 488

CHAPTER 18 Diffusion Models 493
18.1 Overview 493
18.2 Encoder (forward pass) 494
18.3 Decoder Model (Reverse Process) 501
18.4 Training 502
18.5 Reparameterizing the Loss Function 507
18.6 Implementation 510
18.7 Summary 516
_Note 516
Practice Problem 521
_References 524

CHAPTER 19: REINFORCEMENT LEARNING 527
19.1 Markov Decision Processes, Returns, and Policies 528
19.2 Expected return 532
19.3 Tabular Reinforcement Learning 536
19.4 Q-Learning Fit 541
19.5 Policy Gradient Method 545
19.6 The Actor-Critic Method 551
19.7 Offline Reinforcement Learning 552
19.8 Summary 554
_Note 555
Practice Problem 561
_References 564

CHAPTER 20: Why Deep Learning Is Effective? 567
20.1 Cases Against Deep Learning 567
20.2 Factors Affecting Fit Performance 569
20.3 Characteristics of the Loss Function 575
20.4 Generalization Determinants 579
20.5 Do we really need so many parameters? 584
20.6 Should Neural Networks Be Deep? 587
20.7 Summary 590
Practice Problem 591
_References 592

CHAPTER 21: Deep Learning and Ethics 597
21.1 Value Alignment 598
21.2 Intentional misuse 606
21.3 Other Social, Ethical, and Professional Issues 608
21.4 Case Study 611
21.5 The Value-Neutral Ideal of Science 612
21.6 Responsible AI Research from a Collective Action Problem Perspective 614
21.7 The Way Forward 615
21.8 Summary 617
Practice Problem 618
_References 620

APPENDIX A Notation 627
A.1 Scalars, Vectors, Matrices, and Tensors 627
A.2 Variables and Parameters 627
A.3 Set 628
A.4 Function 628
A.5 Minimization and Maximization 629
A.6 Probability Distribution 629
A.7 Asymptotic Notation 630
A.8 Other 630

APPENDIX B Mathematical Concepts 631
B.1 Function 631
B.2 Binomial coefficient 634
B.3 Vectors, Matrices, and Tensors 635
B.4 Special forms of matrices 639
B.5 Matrix Calculus 641

APPENDIX C Probability 643
C.1 Random Variables and Probability Distributions 643
C.2 Expected value 647
C.3 Normal Probability Distribution 652
C.4 Sampling 656
C.5 Distance between probability distributions 657

Search 661

Detailed image
Detailed Image 1

Into the book
Deep neural networks can process inputs that are very large in size, have variable lengths, and have diverse types of internal structures.
It can output probabilities for a single real number (regression), multiple numbers (multivariate regression), or two or more classes (binary classification, multiclass classification, respectively).
As we will see in the next section, the output of deep neural networks can also be very large and have internal structures of variable length.

--- p.6

Often, when we want to make more than one prediction using the same model, the output y that is the target of the prediction becomes a vector.
For example, predicting the melting and boiling points of molecules (a multivariate regression problem, Figure 1.2b) or predicting the object class at every point in an image (a multivariate classification problem, Figure 1.4a).
Although we can define multivariate probability distributions and use neural networks to model the model's parameters as a function of the inputs, we typically treat each prediction independently.

--- p.91

One of the main problems with the gradient descent algorithm is that the final destination is entirely determined by the starting point.
Stochastic gradient descent (SGD) solves this problem by adding some noise to the gradient at each step.
This solution still moves downhill on average, but at any given moment the chosen direction may not necessarily be the steepest downhill.
It may not even be a downward trend. The SGD algorithm may temporarily move uphill, or even move from one "valley" in the loss function to another (Figure 6.5b).

--- p.111

Bias occurs because the model cannot properly represent the true basis function.
Therefore, this error can be reduced by making the model flexible.
This can usually be solved by increasing the capacity of the model.
For neural networks, the capacity of the model can be increased by adding more hidden units/hidden layers.
/ In a simple model, increasing the capacity is equivalent to adding hidden units to split the interval [0, 1] into more linear regions.
Figure 8.7ac shows that this actually reduces the bias.
Increasing the number of linear regions to ten makes the model flexible enough to fit the actual function closely.

--- p.168

In generative self-supervised learning, a subset of each data sample is masked and the auxiliary task of predicting the missing portion is performed (Figure 9.12c).
For example, we can use a collection of unlabeled images and auxiliary operations to inpaint (fill in) missing parts of the images (Figure 9.12c).
Similarly, we mask out some words from a large corpus of text, train a network to predict the missing words, and then fine-tune it for the real-world language task of interest (see Chapter 12).
/ In contrastive self-supervised learning, pairs of samples that have something in common are compared with pairs that have nothing in common.
For images, the auxiliary task is to identify whether pairs of images are variant versions of each other or are unrelated.
For text, the auxiliary task is to check whether two sentences in the original document are contextually connected.
Sometimes auxiliary tasks require identifying the precise relationship between connected pairs (e.g., finding the relative positions of two patches in the same image).

--- p.201

StyleGAN is a more modern GAN that divides the variability of a dataset into meaningful components and treats each component as a latent variable.
In particular, StyleGAN scales the output image at various scales and separates style and noise.
For facial images, coarse-scale changes include facial shape and head pose, medium-scale changes include the shape and details of facial features, and fine-scale changes include hair and skin color.
The style component represents aspects of the image that are salient to humans, while the noise component represents unimportant variations, such as the precise location of hair, beard, freckles, and skin pores.
/ The GANs we've seen so far started with a latent variable z drawn from a standard basis distribution.
It passes through a series of convolutional layers to produce an output image.
However, latent variable inputs to the generator (i) can be applied at various points in the neural network and (ii) can modify the current representation at these points in various ways.
StyleGAN carefully selects scales and separates style from noise (Figure 15.19).
--- p.410

Publisher's Review
A new classic covering deep learning theory and the latest trends.

Deep learning has revolutionized the scientific field and had a significant impact on society as a whole, thanks to the persistent research of Yoshua Bengio, Geoffrey Hinton, and Yann LeCun, who have been conducting research for over 25 years.
Yet, there are still few people who 'properly understand' deep learning.

This book covers a wide range of topics, from the fundamentals of deep learning to cutting-edge architectures like transformers and diffusion models, systematically and intuitively unraveling complex topics.
Up-to-date references, practical examples, and rich visual aids make learning easier and help readers deepen their understanding.
In particular, this book goes beyond a simple technical guide and presents a broad perspective, ranging from the fundamental question of "Why is deep learning effective?" to AI ethics.

Chapter 1 introduces deep learning, and Chapters 2 through 9 cover the overall supervised learning pipeline.
We will explain the architecture of shallow and deep neural networks, and explore how to train them, measure their performance, and improve them.
Chapters 10 to 13 cover representative structures of deep neural networks, such as convolutional neural networks, residual connections, and transformers, and explain how these structures are utilized in supervised learning, unsupervised learning, and reinforcement learning.
Chapters 14 to 18 cover unsupervised learning, focusing on deep generative models such as generative adversarial networks (GANs), variational autoencoders (VAEs), regularization flow, and diffusion models.


Chapter 19 briefly introduces deep reinforcement learning, and Chapter 20 explores key concepts such as double descent, groking, and the lottery ticket hypothesis, addressing fundamental questions such as "Why does deep learning generalize well?", "Why do neural networks need to be so deep?", and "Why do we need so many parameters?"
Finally, Chapter 21 discusses deep learning and ethics.
The appendix summarizes key background knowledge, including notation, mathematical concepts, and probability, to help you follow the concepts without interrupting the learning flow.
Additionally, all images included in the book can be viewed in color via QR codes, further enhancing the learning effect.


This book is neither a theoretical nor a practical book.
There is no proof and very little code.
Instead, it provides a deep and clear understanding of the core concepts of deep learning, and provides a conceptual foundation for finding solutions on your own even when faced with new problems where existing success formulas do not apply.
Each chapter is structured to enable you to fundamentally understand and solve problems encountered in real life.


This is the birth of a new classic in deep learning for those who want to understand deep learning from the beginning, in depth, and properly.

I recommend this in these situations

● When you want to organize the concepts and theories of deep learning
● When you need explanations and visual aids to effectively convey a concept
● When you want to accurately understand the structure and operating principles of the implemented model

Key Contents

● Basic principles of deep learning and neural network structure
● Model training, optimization, and performance evaluation techniques
● Key structures such as CNN, transformer, and graph neural network
Generative models such as GAN, VAE, and diffusion models
● Concept and application of reinforcement learning
● Generalization ability and operating mechanism of deep learning
● AI ethics and the social responsibility of technology
GOODS SPECIFICS
- Date of issue: August 28, 2025
- Page count, weight, size: 696 pages | 188*245*33mm
- ISBN13: 9791194587262

You may also like

카테고리