Skip to product information
The Complete Guide to Python Machine Learning
The Complete Guide to Python Machine Learning
Description
Book Introduction
You can master machine learning through detailed theoretical explanations and Python practice!

"The Complete Guide to Python Machine Learning" breaks away from theory-focused machine learning books, allowing you to learn machine learning through hands-on implementation of various practical examples.
We've compiled practical examples based on challenging practice data from Kaggle and the UCI Machine Learning Repository, and we've covered in detail the latest algorithms and techniques used in many data science fields on Kaggle, including XGBoost, LightGBM, and stacking techniques.

This revised second edition implements practice code that upgrades all libraries used in the book to the latest version, including the latest scikit-learn version (1.0.2), and provides practice applying Bayesian optimization techniques for optimal hyperparameter tuning of XGBoost and LightGBM models with various types of hyperparameters.
We also added a new chapter covering the use of matplotlib and seaborn, visualization libraries widely used in machine learning-related data analysis.
  • You can preview some of the book's contents.
    Preview

index
Chapter 1: Understanding Python-Based Machine Learning and Its Ecosystem

01.
The concept of machine learning
___Classification of machine learning
___Data Wars
___Comparing Python and R-based Machine Learning
02.
Key packages that make up the Python machine learning ecosystem
___Installing software for Python machine learning
03.
NumPy
___NumPy ndarray overview
___ndarray data type
Conveniently creating ___ndarray - arange, zeros, ones
reshape( ) to change the dimensions and size of ___ndarray
___Selecting a dataset from NumPy's ndarray - Indexing
Sorting a matrix - sort( ) and argsort( )
___Linear Algebra Operations - Matrix Inner Product and Transpose
04.
Data Handling - Pandas
___Getting Started with Pandas - Loading Files into DataFrames, Basic API
___Converting DataFrame to List, Dictionary, and NumPy ndarray
___Creating and modifying column data sets in DataFrame
___Delete DataFrame data
___Index object
___Data Selection and Filtering
___Sorting, Aggregation function, GroupBy application
___Handling missing data
Process data using ___apply lambda expression
05.
organize

Chapter 2: Machine Learning with Scikit-Learn

01.
Introduction and Features of Scikit-learn
02.
Building Your First Machine Learning Model - Predicting Iris Species
03.
Learn the basic framework of scikit-learn
Understanding ___Estimator and fit( ), predict( ) methods
___Main modules of scikit-learn
___Built-in example data sets
04.
Introducing the Model Selection Module
___Separate training/test data sets - train_test_split()
___Cross-validation
___GridSearchCV - Cross-validation and optimal hyperparameter tuning in one go 111
05.
Data preprocessing
___data encoding
___Feature scaling and normalization
___StandardScaler
___MinMaxScaler
___Things to keep in mind when scaling training and test data
06.
Predicting Titanic Survivors with Scikit-learn
07.
organize

Chapter 3: Evaluation

01.
Accuracy
02.
Error matrix
03.
Precision and recall
___Precision/Recall Tradeoff
___Blind spots in precision and recall
04.
F1 score
05. ROC Curve and AUC
06.
Pima Indian Diabetes Prediction
07.
organize

Chapter 4: Classification

01.
Overview of Classification
02.
Decision tree
___Features of the decision tree model
___Decision Tree Parameters
___Visualization of decision tree models
___Decision Tree Overfitting
___Decision Tree Practice - User Behavior Recognition Dataset
03.
ensemble learning
___Ensemble Learning Overview
___Voting Types - Hard Voting and Soft Voting
___Voting Classifier
04.
Random Forest
___Overview and Practice of Random Forests
___Random Forest Hyperparameters and Tuning
___GBM Overview and Practice
05. GBM (Gradient Boosting Machine)
___Introducing GBM Hyperparameters
___XGBoost Overview
06. XGBoost (eXtra Gradient Boost)
___Installing XGBoost
___Python Wrapper for XGBoost Hyperparameters
___Applying Python Wrapper XGBoost - Predicting Breast Cancer in Wisconsin
___Overview and Application of XGBoost, a Scikit-Learn Wrapper
07.
LightGBM
___Installing LightGBM
___LightGBM hyperparameters
___Hyperparameter tuning method
___Python wrapper LightGBM and scikit-learn wrapper XGBoost,
___LightGBM hyperparameter comparison
___LightGBM Application - Wisconsin Breast Cancer Prediction
08.
Hyperparameter tuning using Bayesian optimization-based HyperOpt
___Bayesian Optimization Overview
___Using HyperOpt
___XGBoost hyperparameter optimization using HyperOpt
09.
Classification Practice - Kaggle Santander Customer Satisfaction Prediction
___Data preprocessing
___XGBoost model training and hyperparameter tuning
___LightGBM model training and hyperparameter tuning
10.
Classification Practice - Kaggle Credit Card Fraud Detection
___Understanding Undersampling and Oversampling
___Data primary processing and model learning/prediction/evaluation
___Model learning/prediction/evaluation after data distribution transformation
___Model training/prediction/evaluation after removing outlier data
___Model training/prediction/evaluation after applying SMOTE oversampling
11.
Stacking ensemble
___Basic Stacking Model
___CV set-based stacking
12.
organize

Chapter 5: Regression

01.
Introduction to Regression
02.
Understanding Regression through Simple Linear Regression
03.
Minimizing Costs - Introducing Gradient Descent
04.
Predicting Boston Housing Prices Using Scikit-Learn LinearRegression
___LinearRegression Class - Ordinary Least Squares
___Regression Evaluation Index
___Implementing Boston Housing Price Regression Using LinearRegression
05.
Understanding Polynomial Regression and Overfitting/Underfitting
___Understanding Polynomial Regression
___Understanding Underfitting and Overfitting Using Multinomial Regression
___Bias-Variance Tradeoff
06.
Regularized Linear Models - Ridge, Lasso, ElasticNet
___Overview of the Regulatory Linear Model
___ridge regression
___Lasso regression
___ElasticNet Regression
___Data Transformation for Linear Regression Models
07.
logistic regression
08.
Regression tree
09.
Regression Exercise - Predicting Bike Rental Demand
___Data cleansing and processing and data visualization
___Log transformation, feature encoding, and model training/prediction/evaluation
10.
Regression Practice - Kaggle House Prices: Advanced Regression Techniques
___Data Preprocessing
___Linear regression model training/prediction/evaluation
___Regression tree model training/prediction/evaluation
Final prediction through mixing the prediction results of the ___ regression model
___Regression prediction using stacking ensemble models
11.
organize

Chapter 6: Dimensionality Reduction

01.
Dimension Reduction Overview
02. PCA (Principal Component Analysis)
___PCA Overview
03. LDA (Linear Discriminant Analysis)
___LDA Overview
04. SVD(Singular Value Decomposition)
___SVD Overview
___Transformation using scikit-learn TruncatedSVD class
05. NMF (Non-Negative Matrix Factorization)
___NMF Overview
06.
organize

Chapter 7: Clustering

01.
Understanding the K-Means Algorithm
___Introducing the scikit-learn KMeans class
Clustering the Iris Dataset Using ___K-Means
___Generate data for testing clustering algorithms
02.
Cluster Evaluation
___Overview of Silhouette Analysis
Cluster evaluation using the ___iris data set
___A method for optimizing the number of clusters by visualizing the average silhouette coefficient per cluster.
03.
moving average
___Overview of Mean Shift
04. GMM (Gaussian Mixture Model)
Introduction to ___GMM (Gaussian Mixture Model)
Clustering the Iris Dataset Using ___GMM
___Comparison of GMM and K-Means
05. DBSCAN
___DBSCAN Overview
___Applying DBSCAN - Iris Dataset
___Applying DBSCAN - make_circles( ) data set
06.
Clustering Practice - Customer Segmentation
___Definition and techniques of customer segmentation
___Dataset loading and data cleansing
___RFM-based data processing
___RFM-based customer segmentation
07.
organize

Chapter 8 Text Analysis

___NLP or text analysis?
01.
Understanding Text Analysis
___Text Analysis Process
___Python-based NLP and text analysis package
02.
Text Preprocessing - Text Normalization
___Cleansing
___Text tokenization
___Remove stop words
___Stemming and Lemmatization
03.
Bag of Words - BOW
___BOW feature vectorization
___Scikit-learn's implementation of Count and TF-IDF vectorization: CountVectorizer, TfidfVectorizer
___Sparse matrices for BOW vectorization
___Sparse matrix - COO format
___Sparse matrix - CSR format
04.
Text Classification Practice - Classifying 20 Newsgroups
___Text Normalization
___Feature vectorization transformation and machine learning model training/prediction/evaluation
___Using scikit-learn pipelines and combining them with GridSearchCV
05.
Sentiment Analysis
___Introduction to Sentiment Analysis
___Supervised Learning-Based Sentiment Analysis Practice - IMDB Movie Reviews
___Introduction to Unsupervised Learning-Based Sentiment Analysis
Sentiment Analysis Using SentiWordNet
Sentiment Analysis Using ___VADER
06.
Topic Modeling - 20 Newsgroups
07.
Introduction and Practice of Document Clustering (Opinion Review Dataset)
___Document clustering concept
___Performing document clustering using the Opinion Review dataset
___Extracting key words by cluster
08.
Document similarity
___Method for Measuring Document Similarity - Cosine Similarity
___Angle between two vectors
___Measuring Document Similarity Using the Opinion Review Dataset
09.
Korean Text Processing - Naver Movie Rating Sentiment Analysis
___Difficulties in Korean NLP Processing
___Introducing KoNLPy
___Data loading
10.
Text Analysis Practice - Kaggle Mercari Price Suggestion Challenge
___Data preprocessing
___Feature encoding and feature vectorization
Building and Evaluating a Ridge Regression Model
___Building a LightGBM regression model and evaluating the final predictions using an ensemble
11.
organize

Chapter 9: Recommender Systems

01.
Overview and Background of Recommender Systems
___Overview of the recommendation system
___An Essential Element of Online Stores: Recommendation Systems
___Types of Recommendation Systems
02.
Content-based filtering recommendation system
03.
Nearest Neighbor Collaborative Filtering
04.
Latent Factor Collaborative Filtering
___Understanding Latent Factor Collaborative Filtering
Understanding ___Matrix Decomposition
___Matrix decomposition using stochastic gradient descent
05.
Content-Based Filtering Practice - TMDB 5000 Movie Dataset
Movie content-based filtering using ___genre attributes
___Data loading and processing
___Genre Content Similarity Measurement
Movie recommendations using ___genre content filtering
06.
Item-based nearest neighbor collaborative filtering practice
___Data processing and conversion
Calculating similarity between ___movies
Personalized movie recommendations using item-based nearest neighbor collaborative filtering.
07.
Latent Factor Collaborative Filtering Practice Using Matrix Factorization
___Introducing the Surprise Package
08.
Python Recommender System Package - Surprise
Building a Recommendation System Using ___Surprise
___Introducing the main modules of Surprise
___Surprise Recommendation Algorithm Class
___Baseline Score
___Cross-validation and hyperparameter tuning
Building a personalized movie recommendation system using ___Surprise
09.
organize

Chapter 10: Visualization

01.
Getting Started with Visualization - An Overview of Matplotlib and Seaborn
02.
Matplotlib
___Understanding Matplotlib's pyplot module
Understanding Two Key Elements of ___pyplot - Figures and Axes
___Using Figure and Axis
___Creating subplots with multiple plots
Draw a line graph using ___pyplot's plot( ) function
___Set the axis name, rotate the axis tick values, and set the legend.
___Visualize individual graphs by subplot using multiple subplots
03.
Seaborn
___Chart/Graph Types for Visualization
___Types of visualization charts based on the type of information
___Histogram
___count plot
___barplot
Use the hue argument of the ___barplot( ) function to further refine the visualization information.
___box plot
___violin plot
Visualize various graphs in Seaborn using ___subplots
___Scatter plot
___Correlation Heatmap
04.
organize

Detailed image
Detailed Image 1

Publisher's Review
Features of this book

◎ In-depth explanations of core machine learning algorithms, including classification, regression, dimensionality reduction, and clustering.
◎ Presentation of optimal machine learning model configuration methods, including data preprocessing, machine learning algorithm application, hyperparameter tuning, and performance evaluation.
◎ Detailed explanations and usage methods for the latest machine learning techniques, such as XGBoost, LightGBM, and stacking.
◎ Learn practical machine learning application development methods by solving challenging Kaggle problems (e.g., predicting customer satisfaction at Santander Bank, detecting credit card fraud, using advanced regression techniques to predict real estate prices, and predicting prices at Mercari shopping malls).
◎ Provides basic theories and various practical examples for text analysis and NLP (text classification, sentiment analysis, topic modeling, document similarity, document clustering and similarity, sentiment analysis of Naver movies using KoNLPy, etc.)
Provides instructions for building various recommendation systems directly with Python code.
GOODS SPECIFICS
- Date of issue: April 21, 2022
- Page count, weight, size: 724 pages | 188*240*29mm
- ISBN13: 9791158393229
- ISBN10: 1158393229

You may also like

카테고리