Skip to product information
Statistical Machine Learning with Python
Statistical Machine Learning with Python
Description
Book Introduction
Machine learning refers to a data science model that performs prediction, classification, dimensionality reduction, generation, and reproduction using only given data.
In statistics, statistical inferences such as estimation, testing, and prediction are made using a considerable level of statistical and mathematical knowledge based on assumptions about data.
However, machine learning satisfies the basic conditions of statistics by splitting data, sample weights, resampling, and randomization so that good statistical inferences can be derived only from given data without any assumptions about the data.
Therefore, understanding the fundamentals of statistics is essential to understanding the foundations of machine learning and, based on this, developing your own high-performance machine learning model.

A fundamental assumption of statistics is that the given data is a random sample from an unknown population, and that such a random sample can be repeatedly extracted.
A random sample means that the sample was randomly selected from an unknown population. In simple terms, this means that the given data is selected to represent the unknown population well.
The second assumption, repeated random sampling, enables theoretical reasoning in statistics and provides the foundation for mathematical statistics and probability theory.

However, in real-world problems, only one dataset is observed.
In machine learning, random sampling and repeated random sampling in statistics are implemented through splitting and resampling based on data shuffling to perform various statistical inferences.
No further mathematical or statistical knowledge is required.
Unlike statistics, it does not make the unreasonable assumption that the model is correct but that the parameters in the model are unknown. Instead, it is easy to check whether the model is correct and the parameter estimates are correct simply by splitting the data.
Resampling allows for more precise statistical inference and, in particular, enables ensemble learning called bagging.
Assigning weights based on the importance of each sample is called sample weights.
Sample weights are used in all statistical techniques based on K-nearest neighborhood and in boosting, a cutting-edge model, and are used in the loss function, which is the objective function for estimating parameters in machine learning.
Randomization is an important means of checking whether the model has learned even unnecessary noise.

Therefore, if you read and understand this book with the keywords split, sample weights, resampling, and randomization, you can consider yourself to have acquired a grasp of how statistical fundamentals and machine learning methodology are integrated.
Then, you will be able to proceed without much difficulty on your journey of AI analysis models, from statistical machine learning, which is the subject of this book, to deep learning, reinforcement learning, XAI, and even time series analysis when necessary.
From this perspective, reading Chapter 1 carefully and running the given code will allow you to understand the four keywords mentioned above through experience.

I did my best to make a good book, but there may be some shortcomings.
We ask for your understanding on this matter, and any revisions that may occur after publication will be provided in the Free Academy website data room (www.freeaca.com), so please refer to it.
Finally, I would like to express my gratitude to Jinse Park, who drew the conceptual diagram for this book, and to my loving wife and daughter, who have supported me with endless love and encouragement.
  • You can preview some of the book's contents.
    Preview

index
Chapter 1: Principles of Statistics and Machine Learning

1.1 What is good data?
1.2 The role of the model and error term
1.3 Splitting, weighting, and resampling data
1.4 Statistical Machine Learning, Deep Learning, and Reinforcement Learning
1.5 AI Models and Loss Functions
1.6 Data Analysis Procedure, Model Summary
1.7 AI Knowledge Required for Data Scientists

Chapter 2 Preprocessing and Optimization

2.1 Conversion to real data
2.2 Data Characteristics
2.3 Case Analysis
2.4 Handling imbalanced data
2.5 Selection of characteristic variables
2.6 Loss Function and Optimization

Chapter 3 Data Visualization

3.1 AutoViz
3.2 Bamboolib
3.3 Plotly

Chapter 4 K-Nearest Neighbors

4.1 Application of KNN
4.2 Kernel distribution function estimation

Chapter 5 Logistic Regression Classification

5.1 Adaptive linear neurons
5.2 Logistic Regression
5.3 Regularization against overfitting
5.4 Logistic Regression with Scikit

Chapter 6 Discriminant Analysis and Simple Bayes Models

6.1 Discriminant Analysis
6.2 Simple Bayes model
6.3 LDA and Simple Bayes Models Using Scikit-learn

Chapter 7 Classification and Regression Trees

7.1 Regression Tree
7.2 Classification tree
7.3 Decision Trees Using Scikit-learn

Chapter 8 Support Vector Machines

8.1 Support Vector Machine
8.2 Kernel SVM
8.3 SVM using Sklearn

Chapter 9 Dimensionality Reduction

9.1 Singular value decomposition
9.2 Probabilistic PCA
9.3 Kernel PCA
9.4 Factor Analysis
9.5 Dimensionality Reduction through Linear Discriminant Analysis
9.6 Dimensionality Reduction for Visualization
9.7 Dimensionality Reduction with Sklearn

Chapter 10 Error Analysis, Data Partitioning, and Hyperparameter Adjustment

10.1 Error Analysis
10.2 Data partitioning
10.3 Hyperparameter tuning
10.4 Cross-validation

Chapter 11 Regression Analysis

11.1 Linear Regression Model
11.2 Quantile regression
11.3 Robust Regression
11.4 SVM Regression and Kernel SVM Regression
11.5 Regularized linear regression model
11.6 Regression Analysis Using Scikit-learn

Chapter 12 Clusters

12.1 K-means clustering
12.2 Hierarchical clustering
12.3 DBSCAN and HDBSCAN
12.4 Clustering using Scikit-learn

Chapter 13 Ensemble Learning

13.1 Bagging, Pasting, and Random Forest
13.2 Characteristics of Statistical Machine Learning for Ensemble Learning
13.3 Adaboost
13.4 Gradient Boosting
13.5 XGBoost
13.6 LightGBM
13.7 CatBoost
13.8 Application Cases

Chapter 14 Comparison and Characteristics of XGBoost, LightGBM, and CatBoost

14.1 Comparison with Traditional Statistical Models: Regression
14.2 Importance and effectiveness of feature variables in XGBoost, LightGBM, and CatBoost
14.3 Comparison with Traditional Statistical Models: Classification

Chapter 15 Bagging and Boosting

15.1 Decision Tree
15.2 Random Forest
15.3 Gradient Boosting
15.4 Classification

Chapter 16: Characteristics and Tuning of Hyperparameters in XGBoost, LightGBM, and CatBoost

16.1 Convergence Speed ​​Comparison
16.2 Comparison and tuning of hyperparameters
16.3 Handling imbalanced data

Chapter 17 Metamodels and Model Automation

17.1 Metamodel
17.2 Model Automation

Chapter 18 Sentiment Analysis

18.1 Sentiment Analysis
18.2 Case Study Using Python

References
Practice Problem Explanation
Search

Detailed image
Detailed Image 1
GOODS SPECIFICS
- Date of issue: October 25, 2023
- Page count, weight, size: 532 pages | 188*257*35mm
- ISBN13: 9791158085346
- ISBN10: 1158085346

You may also like

카테고리