Skip to product information
LLM: Building from the Ground Up
LLM: Building from the Ground Up
Description
Book Introduction
By following the code line by line, you will complete your own GPT!
A practical guide to implementing GPT from the ground up and mastering the principles of LLM at your fingertips.


Difficult concepts are solved through pictures, and LLM is learned by making things yourself.
This book is a practical introductory LLM book that allows you to learn the structure and operating principles of large-scale language models from start to finish.
Rather than simply explaining the concepts, we start with text preprocessing, tokenization, and embedding, and then build self-attention, multi-head attention, and transformer blocks step by step.
Next, we integrate these components to complete an actual GPT model, and directly cover key elements of modern architecture design, such as the number of model parameters, training stabilization techniques, activation functions, and regularization methods.


It also provides in-depth guidance on the pre-learning and fine-tuning process.
You can pre-train on unlabeled data, tune models for downstream tasks like text classification, and even practice with emerging directed learning techniques.
It also covers cutting-edge topics such as Parameter Efficient Fine Tuning (PEFT) based on LoRA, and presents a wide range of ways to connect LLM to real-world services and research.
All concepts are implemented in PyTorch code and optimized for practice in a standard laptop environment.
By following the implementation process in this book, you will naturally understand what happens inside LLM and gain a hands-on understanding of how the mechanisms of large-scale language models work.
  • You can preview some of the book's contents.
    Preview

index
Chapter 1: Understanding Large-Scale Language Models
1.1 What is an LLM?
1.2 LLM Application
1.3 LLM Construction Phase
1.4 Introduction to Transformer Structure
1.5 Leveraging Large Datasets
1.6 A Closer Look at the GPT Structure
1.7 Building a Large-Scale Language Model
1.8 Summary

Chapter 2: Handling Text Data
2.1 Understanding Word Embeddings
2.2 Tokenizing Text
2.3 Converting a token to a token ID
2.4 Adding Special Context Tokens
2.5 byte pair encoding
2.6 Sampling data with a sliding window
2.7 Creating Token Embeddings
2.8 Encoding word positions
2.9 Summary

Chapter 3: Implementing the Attention Mechanism
3.1 Problems with modeling long sequences
3.2 Capturing Data Dependencies with Attention Mechanisms
3.3 Paying attention to different parts of the input with self-attention
__3.3.1 A simple self-attention mechanism without trainable weights
__3.3.2 Calculating attention weights for all input tokens
3.4 Implementing self-attention with trainable weights
__3.4.1 Calculating attention weights step by step
__3.4.2 Implementing a Self-Attention Python Class
3.5 Hiding future words with nasal attention
__3.5.1 Applying the nasal attention mask
__3.5.2 Additional masking of attention weights with dropout
__3.5.3 Implementing the Kozal Attention Class
3.6 Extending single-head attention to multi-head attention
__3.6.1 Stacking Multiple Single-Head Attention Layers
__3.6.2 Implementing multi-head attention with weight splitting
3.7 Summary

Chapter 4: Implementing a GPT Model from Scratch
4.1 Implementing the LLM Structure
Activation normalization with 4.2 layer normalization
4.3 Implementing a feedforward network using the GELU activation function
4.4 Adding a shortcut connection
4.5 Connecting the Attention and Linear Layers to the Transformer Block
4.6 Creating a GPT Model
4.7 Creating Text
4.8 Summary

Chapter 5 Pretraining with Unlabeled Data
5.1 Evaluating the Text Generation Model
__5.1.1 Generating Text Using GPT
__5.1.2 Calculating Text Generation Loss
__5.1.3 Calculating the loss on the training and validation sets
5.2 LLM Training
5.3 Decoding strategies to control randomness
__5.3.1 Temperature Scaling
__5.3.2 Top-k sampling
__5.3.3 Modifying the text generation function
5.4 Loading and Saving Models with PyTorch
5.5 Loading Pretrained Weights from OpenAI
5.6 Summary

Fine-tuning for Chapter 6 classification
6.1 Various fine-tuning methods
6.2 Preparing the dataset
6.3 Creating a Data Loader
6.4 Initializing the Model with Pretrained Weights
6.5 Adding a Classification Header
6.6 Calculating classification loss and accuracy
6.7 Fine-tuning the model with supervised learning data
6.8 Using LLM as a Spam Classifier
6.9 Summary

Chapter 7 Fine-tuning to follow instructions
7.1 Introduction to Instruction Fine Tuning
7.2 Preparing the Dataset for Supervised Learning Fine-Tuning
7.3 Creating a Training Batch
7.4 Creating a Data Loader for the Instruction Dataset
7.5 Loading a Pre-Trained LLM
7.6 Fine-tuning LLM from instruction data
7.7 Extracting and Saving Responses
7.8 Evaluating the Fine-Tuned LLM
7.9 Conclusion
__7.9.1 What's next?
__7.9.2 Stay up to date on rapidly developing fields
__7.9.3 Conclusion
7.10 Summary

Appendix A: Introduction to PyTorch
A.1 What is PyTorch?
__A.1.1 Three Core Components of PyTorch
__A.1.2 What is deep learning?
__A.1.3 Installing PyTorch
A.2 Understanding Tensors
__A.2.1 Scalars, Vectors, Matrices, and Tensors
__A.2.2 Tensor data type
__A.2.3 Frequently Used PyTorch Tensor Operations
A.3 Viewing the model as a computational graph
A.4 Automatic differentiation made easy
A.5 Creating a Multilayer Neural Network
A.6 Setting up an efficient data loader
A.7 General Training Loop
A.8 Saving and Loading Models
A.9 Optimizing Training Performance with GPUs
__A.9.1 PyTorch Computations Using GPU Devices
__A.9.2 Single GPU Training
__A.9.3 Multi-GPU Training
A.10 Summary

Appendix B References and Further Reading

Appendix C: Practice Problems and Answers

Appendix D Adding Additional Features to Your Training Loop
D.1 Learning rate warmup
D.2 Cosine Attenuation
D.3 Gradient Clipping
D.4 Modified training function

Appendix E Parameter-Efficient Fine-Tuning Using LoRA
E.1 Introduction to LoRA
E.2 Preparing the Dataset
E.3 Initializing the model
E.4 Parameter-Efficient Fine-Tuning Using LoRA


[Workbook Table of Contents]
Chapter 1: Understanding Large-Scale Language Models
Chapter 2: Handling Text Data
Chapter 3: Implementing the Attention Mechanism
Chapter 4: Implementing a GPT Model from Scratch
Chapter 5 Pretraining with Unlabeled Data
Fine-tuning for Chapter 6 classification
Chapter 7 Fine-tuning to follow instructions

Appendix A: Introduction to PyTorch
Appendix D Adding Additional Features to Your Training Loop
Appendix E Parameter-Efficient Fine-Tuning Using LoRA

Detailed image
Detailed Image 1

Publisher's Review
“If you can’t make it, you don’t truly understand it.” - Richard Feynman

The best way to understand LLM is to implement it yourself from the ground up.


As the name suggests, LLM (Large Language Model) is a very large model.
But just because it's big doesn't mean you have to think of an LLM as a black box.
As Feynman said, the best way to truly understand something is to make it.
Learn how to build an LLM step by step through this book.
Let's develop a base model ourselves without using any other existing LLM library, develop it into a text classifier, and ultimately create a chatbot that follows my conversational instructions.
We'll cover every step, from planning and coding your model build to training and fine-tuning it.
By the end of this book, you will have a solid, fundamental understanding of how LLMs like ChatGPT work.

[Contents of this book]

■ Planning and developing an LLM similar to GPT-2
■Fine-tuning LLM for text classification
■Loading pre-trained weights
■Developing an LLM that follows human instructions
Building a Complete Training Pipeline

A workbook is also included! Let's understand it more easily and clearly!

The learning method in this book—building your own models—is the best way to learn the fundamentals of how large-scale language models work.
Although we've included clear explanations, pictures, and code for this, it can feel daunting because the subject matter is complex.
I have prepared a workbook to help you understand more easily and clearly.
This workbook follows the structure of "Building from Scratch LLM," covering key concepts in each chapter and challenging you with multiple-choice quizzes, questions about code and key concepts, and questions that require deep thinking and lengthy answers.
Of course, answers to the questions are also included.
Let's make sure to master the knowledge we've learned by using it in various ways, such as before or after reading the text, or when we want to study repeatedly over time.

[Author's Preface]

I believe that being confident in writing code related to fundamental concepts is crucial to success in this field.
This helps us not only fix bugs and improve performance, but also experiment with new ideas.
When I first started my LLM a few years ago, I had a hard time learning how to implement this.
I had to dig through a lot of research papers and incomplete code repositories to get the full story.
This book provides step-by-step tutorials detailing the key components and development stages of the LLM.
I hope this will help you understand LLM more easily.

I firmly believe that the best way to understand LLM is to implement it yourself from the ground up.
You'll find this to be fun, too! Happy reading and coding!

[Translator's Preface]

I agree with the author that the best way to understand LLM is to implement it yourself from the ground up.
This learning method is especially effective in computer science and machine learning.
As an engineer, I'm still more curious about how tools work.
Perhaps those of you who have picked up this book are similar to me.
By following this book and building your LLM tower, brick by brick, using PyTorch, you'll gain a clear understanding of the state of AI and LLM today.

As we dissect the structure of the LLM, we find ourselves nodding along to the argument that the LLM produces texts that seem to be making inferences not for actual reasoning but for the sake of post-hoc rationalization.
What is reasoning, essentially? How do humans differentiate themselves from machines? There are still so many questions we haven't solved.
There is no doubt that this field will continue to be filled with interesting and mysterious things.

[Beta Reader Review]

It's rare to find an explanation of transformer architecture and LLM that's this easy to understand.
I found it especially helpful that it provided step-by-step exercises.
I recommend this book to all AI developers because it provides a solid foundation.
- Kim Byeong-gyu | Ibricks AI Service Developer

The GPT structure and learning process are explained along with actual implementation steps, giving me a sense of how models like ChatGPT work. For me, who had absolutely no knowledge of LLM, this book was a valuable guide, systematically building a foundation from basic concepts to practical application.
- Kim Jun-ho | SSG.com Backend Developer

I believe this book significantly reduces the fear and barrier to entry for LLM development, as it combines conceptual explanations with practical exercises.
Kim Min-seon | Data Management, Korea Water Resources Corporation

All sources are available for direct practice in Google Colab, which I found particularly satisfying as it provided an environment where I could practice while reading.
Above all, I was impressed by the absence of errors in the provided source.
This book is designed to be understood through a combination of theory and practice, so I highly recommend it to anyone who wants to understand the fundamental principles of how LLM works.
Kim Jong-yeol, Team Leader, Ecosystem Solutions Division

Starting from the basic concepts of LLM, we naturally progress to more advanced content through hands-on practice with actual code.
This is an excellent guide that helps you understand the principles deeply by building up from the basics step by step.
- Chu Sang-won | GOTROOT Pentester

You can gain academic inspiration by getting a glimpse into the idea-forming process of a research paper, and by implementing abstract research-level concepts from the ground up, you can clearly understand the more complex parts of the paper.
Even though it covers a wide range of content, the absence of code errors allows for immersion in the content, and the intuitive code implementation centered around PyTorch is also very helpful for practical use.
Heo Min | AI Development and Strategic Planning, Information Strategy Team, Hankuk University of Foreign Studies

I was impressed by the structure, which was designed to be easy to follow even for those unfamiliar with PyTorch. This is the ultimate practical guide, covering the entire LLM development process, from fundamental theory to actual model implementation and practical fine-tuning techniques.

- Kang Kyung-mok | Manager (Team Leader) at Korea Somebet (affiliate of Harim Group) and Doctor of Business Administration

This is a great book that goes beyond simply teaching you how to use LLM, and helps you understand it by implementing it yourself.

- Lee Jin | Data Scientist, Kyungdong Navien Development Team

This book explains the self-attention mechanism of the Transformer step-by-step. If you want to develop an intuition for the internal workings of the LLM and GPT models, this is a must-read.

- Amazon Reader Reviews

The author provides all the code, making it easy to follow.
You can easily modify the code and learn a lot. If you want to learn how an LLM works, it's the best investment you can make.
- Amazon Reader Reviews
GOODS SPECIFICS
- Date of issue: September 22, 2025
- Page count, weight, size: 560 pages | 183*235*23mm
- ISBN13: 9791140715848

You may also like

카테고리