Skip to product information
Learning Speech Recognition with Python
Learning Speech Recognition with Python
Description
Book Introduction
From the basics of speech recognition to deep learning practice using PyTorch,
A book on learning voice recognition with Python has been published!


Voice recognition is a technology that recognizes spoken content from a voice signal. In other words, it's a technology that allows a computer to recognize and process human speech as a signal. Voice recognition technologies, such as AI speakers and smartphone voice assistants, have already permeated our daily lives.
Voice recognition technology is used in various places, such as the preprocessing stage of a voice translation system and an automatic meeting minutes writing system.
In particular, voice recognition technology is expected to grow in various fields, such as linking with vehicle navigation systems or as an input device for the physically disabled, as it can operate machines hands-free.

This book provides an understanding of the history and technological development trends of speech recognition technology to date, and provides hands-on practice with the latest speech recognition systems using Python and PyTorch.
It explains the purpose of speech recognition and an overview of the algorithm, analyzes the source code in detail, and finally attaches the source code so that readers can implement it themselves.
In particular, it clearly states the purpose for which artificial intelligence technology was created and whether the technology can achieve the purpose.
This book will be a must-read for developers and undergraduates who want to properly implement speech recognition.

-Download the source code https://github.com/bjpublic/python_speech_recognition
  • You can preview some of the book's contents.
    Preview

index
Chapter 1 What is Speech Recognition?

Section 1: What is speech recognition and where is it used?
Section 2: What is voice recognition? - Voice recognition principles -
Section 3: Purpose and Structure of This Book

Chapter 2 Basic Knowledge of Speech Recognition

Section 1: Speech Recognition and Probability
Section 2: Defining the speech recognition problem in a formula
Section 3 Text Types and Pronunciation Dictionary - Phonemes, Hiragana, Characters, Words -
Section 4 Two Voice Recognition Experiments
Section 5: Voice Recognition Experiment Process

Chapter 3: Basics of Speech Processing and Feature Extraction

Section 1: Preparing Data
Section 2: Reading the audio file
Section 3 Frequency decomposition of speech using Fourier transform
Section 4: Generating a spectrum by performing a short-time Fourier transform on the voice
Section 5 Log Mel Filter Bank Features
Section 6 Mel Frequency Cepstrum Characteristics
Section 7: Calculating the Mean and Standard Deviation of Features

Chapter 4: First Steps in Voice Recognition: DP Matching

Section 1: Alignment Problems Inseparable from Speech Recognition
Section 2 DP Matching
Section 3: Implementing DP Matching

Chapter 5 GMM-HMM-based Speech Recognition

Section 1: Distribution and Frequency Perspectives, Not Templates
Section 2: Parameter Estimation Using Normal Distribution and Mode Estimation
Section 3 Mixed Normal Distribution (GMM) and EM Algorithm
Section 4 Hidden Markov Model (HMM)
Section 5: Implementing GMM-HMM

Chapter 6 DNN-HMM-based speech recognition

Section 1: From ‘Distribution’ to ‘Deep Neural Network’
Section 2 Deep Neural Network
Section 3: DNN-HMM Hybrid System Combining DNN and HMM
Section 4: Implementing DNN - HMM with Python and PyTorch
Section 5 HMM-based large-vocabulary continuous speech recognition

Chapter 7 End-to-End Model-Based Continuous Speech Recognition

Section 1: From Hybrid System to Full Neural Network Model
Section 2 Recurrent Neural Network
Section 3 Connectionist temporal classification (CTC)
Section 4: Implementing CTC with Python and PyTorch
Section 5 Attention encoder-decoder model
Section 6: Implementing an Attention Model with Python and PyTorch
Section 7 Other Techniques and Speech Recognition Models
Section 8 References

Detailed image
Detailed Image 1

Publisher's Review
To implement and practice a speech recognition machine learning model in Python
Speech Recognition Basics and Step-by-Step Practice

Voice recognition technology is advancing day by day.
In the past, I used to use a recorder when I needed to transcribe important meeting or lecture notes, but these days I use a voice recognition note app.
Voice recognition has already permeated our daily lives, whether it's Apple Siri, Google Assistant, or Samsung Bixby installed on our phones, or the voice recognition supported by the smart speakers we all have at home, or the navigation apps we use in our cars.
Speech recognition has recently experienced rapid technological advancements based on deep learning modeling, leading to improved recognition quality. However, more data and models are still needed to achieve perfect speech recognition.
There are many practical problems to be solved, such as speech recognition in noisy environments and when multiple people are speaking simultaneously.

This book covers the history of voice recognition technology development from the past to the present.
It tells a story about the influence of past speech recognition techniques on current speech recognition technology.
It also provides theoretical knowledge and practical code for each voice recognition technique, including probability and signal processing theories widely used in voice recognition technology, and helps you develop the latest voice recognition system using deep learning using Python and PyTorch.


The author of this book has written this book so that readers who are new to speech recognition can understand the overall flow and characteristics of speech recognition technology beyond simply understanding it.
I recommend this book to anyone interested in how voice recognition technology will develop in the future, anyone who wants to quickly apply voice recognition technology in their work, anyone interested in voice processing and machine learning in general, and anyone who wants to seriously learn about voice recognition technology in the future.
GOODS SPECIFICS
- Date of issue: January 9, 2023
- Page count, weight, size: 336 pages | 188*245*30mm
- ISBN13: 9791165921828
- ISBN10: 1165921820

You may also like

카테고리