注冊 | 登錄讀書好,好讀書,讀好書!
讀書網(wǎng)-DuShu.com
當(dāng)前位置: 首頁出版圖書科學(xué)技術(shù)計算機(jī)/網(wǎng)絡(luò)圖形圖像、多媒體、網(wǎng)頁制作視頻/音頻/流媒體語音識別基本原理:英文版

語音識別基本原理:英文版

語音識別基本原理:英文版

定 價:¥41.00

作 者: (英)[勞倫斯·拉賓納]Lawrence Rabiner,(英)[阮平望]Biing-Hwang Juang著
出版社: 清華大學(xué)出版社
叢編項: 大學(xué)計算機(jī)教育叢書 影印版
標(biāo) 簽: 視頻/音頻/流媒體

ISBN: 9787302036401 出版時間: 1999-09-01 包裝: 平裝
開本: 23cm 頁數(shù): 507 字?jǐn)?shù):  

內(nèi)容簡介

  本書面向工程技術(shù)人員、科技工作者、語言學(xué)家、編程人員。主要講解有關(guān)現(xiàn)代語音識別系統(tǒng)的基本知識、思路和方法。本書共9章。分別為:1.語音識別原理;2.語音信號的產(chǎn)生、感知及聲學(xué)語音學(xué)特征;3.用于語音識別的信號處理和分析方法;4.模式對照技術(shù);5.語音識別系統(tǒng)的設(shè)計與實現(xiàn)結(jié)果;6.隱馬爾可夫模型的理論與實踐;7.基于連接詞模型的語音識別;8.大詞匯量連續(xù)語音識別;9.適合不同任務(wù)的自動語音識別應(yīng)用。本書既可供研究工作者借鑒,也可供研究生在學(xué)習(xí)有關(guān)語音信號數(shù)字處理課程時參考。

作者簡介

暫缺《語音識別基本原理:英文版》作者簡介

圖書目錄

     CONTENTS
    LIST OF FIGURES
    LIST OF TABLES
   PREFACE
   1 FUNDAMENTALS OF SPEECH RECOGNITION
    1.1 Introduction
    1.2 The Paradigm for Speech Recognition
    1.3 Outline
    1.4 A Brief History of Speech-Recognition Research
   2 THE SPEECH SIGNAL: PRODUCTION, PERCEPTION, AND
    ACOUSTIC-PHONETICCHARACTERIZATION
    2.1 Introduction
    2.1.1 The Process of Speech Production and Perception in HumanBeings
    2.2 The Speech-Production Process
    2.3 Representing Speech in the Time and Frequency Domains
    2.4 Speech Sounds and Features
    2.4.1 TheVowels
    2.4.2 Diphthongs
    2.4.3 Semivowels
    2.4.4 Nasal Consonants
    2.4.5 Unvoiced Fricatives
    2.4.6 Voiced Fricatives
    2.4.7 Voiced and Unvoiced Stops
    2.4.8 Review Exercises
    2.5 Approaches to Automatic Speech Recognition by Machine
    2.5.1 Acoustic-Phonetic Approach to Speech Recognition
    2.5.2 Statistical Pattem-Recognition Approach to SpeechRecognition
    2.5.3 Artificial Intelligence (AI) Approaches to SpeechRecognition
    2.5.4 Neural Networks and Their Application to SpeechRecognition
    2.6 Summary
   3 SIGNAL PROCESSING AND ANALYSIS METHODS FOR SPEECH
    RECOGNITION
    3.1 Introduction
    3.1.1 Spectral Analysis Models
    3.2 The Bank-of-Filters Front-End Processor
    3.2.1 Types of Filter Bank Used for Speech Recognition
    3.2.2 Implementations of Filter Banks
    3.2.3 Summary of Considerations for Speech-Recognition Filter
    Banks
    3.2.4 Practical Examples of Speech-Recognition Filter Banks
    3.2.5 Generalizations of Filter-Bank Analyzer
    3.3 Linear Predictive Coding Model for Speech Recognition
    3.3.1 The LPC Model
    3.3.2 LPC Analysis Equations
    3.3.3 The Autocorrelation Method
    3.3.4 The Covariance Method
    3.3.5 Review Exercise
    3.3.6 Examples of LPC Analysis
    3.3.7 LPC Processor for Speech Recognition
    3.3.8 Reviev Exercises
    3.3.9 Typical LPC Analysis Parameters
    3.4 Vector Quantization
    3.4.1 Elements of a Vector Quantization Implementation
    3.4.2 The VQ Training Set
    3.4.3 The Similarity or Distance Measure
    3.4.4 Clustering the Training Vectors
    3.4.5 Vector Classification Procedure
    3.4.6 Comparison of Vector and Scalar Quantizers
    3.4.7 Extensions of Vector Quantization
    3.4.8 SummaryoftheVQMethod
    3.5 Auditory-Based Spectral Analysis Models
    3.5.1 TheEIHModel
    3.6 Summary
   4 PATTERN-COMPARISON TECHNIQUES
    4.1 Introduction
    4.2 Speech (Endpoint) Detection
    4.3 Distortion Measures--Mathematical Considerations
    4.4 Distortion Measures-Perceptual Considerations
    4.5 Spectral-Distortion Measures
    4.5.1 Log Spectral Distance
    4.5.2 Cepstral Distances
    4.5.3 Weighted Cepstral Distances and Liftering
    4.5.4 Likelihood Distortions
    4.5.5 Variations of Likelihood Distortions
    4.5.6 Spectral Distotion Using a Warped Frequency Scale
    4.5.7 Altemative Spectral Representations and DistortionMeasures
    4.5.8 Summary of Distortion Measures-ComputationalConsiderations
    4.6 Incorporation of Spectral Dynamic Features into the DistortionMeasure
    4.7 Time Alignment and Normalization
    4.7.1 Dynamic Programming--Basic Considerations
    4.7.2 Time-Normalization Constraints
    4.7.3 Dynamic Time-Warping Solution
    4.7.4 Other Considerations in Dynamic Time Warping
    4.7.5 Multiple Time-Alignment Paths
    4.8 Summary
   5 SPEECH RECOGNITION SYSTEM DESIGN AND IMPLEMENTATION
    ISSUES
    5.1 Introduction
    5.2 Application of Source-Coding Techniques tp Recognition
    5.2.1 Vector Quantization and Pattem Comparison Without TimeAlignment
    5.2.2 Centroid Computation for VQ Codebook Design
    5.2.3 Vector Quantizers with Memory
    5.2.4 Segmental Vector Quantization
    5.2.5 Use of a Vector Quantizer as a Recognition Preprocessor
    5.2.6 Vector Quantization for Efficient Pattem Matching
    5.3 Template Training Methods
    5.3.1 Casual Training
    5.3.2 Robust Training
    5.3.3 Clustering
    5.4 Performance Analysis and Recognition Enhancements
    5.4.1 Choice of Distortion Measures
    5.4.2 Choice of Clustering Methods and kNN Decision Rule
    5.4.3 Incorporation of Energy Information
    5.4.4 Effects of Signal Analysis Parameters
    5.4.5 Performance of Isolated Word-Recognition Systems
    5.5 Template Adaptation to New Talkers
    5.5.1 Spectral Transformation
    5.5.2 Hierarchical Spectral Clustering
    5.6 Discriminative Methods in Speech Recognition
    5.6.1 Determination of Word Equivalence Classes
    5.6.2 Discriminative Weighting Functions
    5.6.3 Discriminative Training for Minimum Recognition Error
    5.7 Speech Recognition in Adverse Environments
    5.7.1 Adverse Conditions in Speech Recognition
    5.7.2 Dealing with Adverse Conditions
    5.8 Summary
   6 THEORY AND IMPLEMENTATION OF HIDDEN MARKOV MODELS
    6.1 Introduction
    6.2 Discrete-Time Markov Processes
    6.3 Extensions to Hidden Markov Models
    6.3.1 Coin-Toss Models
    6.3.2 The Um-and-Ball Model
    6.3.3 Elements of an HMM
    6.3.4 HMM Generator of Observations
    6.4 The Three Basic Problems for HMMs
    6.4.1 Solution to Problem 1-Probability Evaluation
    6.4.2 Solution to Problem 2--"Optimal" State Sequence
    6.4.3 Solution to Problem 3--Parameter Estimation
    6.4.4 Notes on the Reestimation Procedure
    6.5 TypesofHMMs
    6.6 Continuous Observation Densities in HMMs
    6.7 Autoregressive HMMs
    6.8 Variants on HMM Structures-Null Transitions and TiedStates
    6.9 Inclusion of Explicit State Duration Density in HMMs
    6.10 Optimization Criterion-ML, MMI, and MDI
    6.11 Comparisons of HMMs
    6.12 Implementation Issues for HMMs
    6.12.1 Scaling
    6.12.2 Multiple Observation Sequences
    6.12.3 Initial Estimates of HMM Parameters
    6.12.4 Effects of Insufficient Training Data
    6.12.5 ChoiceofModel
    6.13 Improving the Effectiveness of Model Estimates
    6.13.1 Deleted Interpolation
    6.13.2 Bayesian Adaptation
    6.13.3 Corrective Training
    6.14 Model Clustering and Splitting
    6.15 HMM System for Isolated Word Recognition
    6.15.1 Choice of Model Parameters
    6.15.2 Segmental K-Means Segmentation into States
    6.15.3 Incorporation of State Duration into the HMM
    6.15.4 HMM Isolated-Digit Performance
    6.16 Summary
   7 SPEECH RECOGNITION BASED ON CONNECTED WORD MODELS
    7.1 Introduction
    7.2 General Notation for the Connected Word-Recognition
    Problem
    7.3 The Two-Level Dynamic Programming (Two-Level DP)
    Algorithm
    7.3.1 Computation of the Two-Level DP Algorithm
    7.4 The Level Building (LB) Algorithm
    7.4.1 Mathematics of the Level Building Algorithm
    7.4.2 Multiple Level Considerations
    7.4.3 Computation of the Level Building Algorithm
    7.4.4 Implementation Aspects of Level Building
    7.4.5 Integration of a Grammar Network
    7.4.6 Examples of LB Computation of Digit Strings
    7.5 The One-Pass (One-State) Algorithm
    7.6 Multiple Candidate Strings
    7.7 Summary of Connected Word Recognition Algorithms
    7.8 Grammar Networks for Connected Digit Recognition
    7.9 Segmental K-Means Training Procedure
    7.10 Connected Digit Recognition Implementation
    7.10.1 HMM-Based System for Connected Digit Recognition
    7.10.2 Performance Evaluation on Connected Digit Stririgs
    7.11 Summary
   8 LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
    8.1 Introduction
    8.2 Subword Speech Units
    8.3 Subword Unit Models Based on HMMs
    8.4 Training of Subword Units
    8.5 Language Models for Large Vocabulary Speech
    Recognition
    8.6 Statistical Language Modeling
    8.7 Perplexity of the Language Model
    8.8 Overall Recognition System Based on Subword Units
    8.8.1 Control of Word Insertion/Word Deletion Rate
    8.8.2 Task Semantics
    8.8.3 System Performance on the Resource Management Task
    8.9 Context-Dependent Subword Units
    8.9.1 Creation of Context-Dependent Diphones and Triphones
    8.9.2 Using Interword Training to Create CD Units
    8.9.3 Smoothing and Interpolation of CD PLU Models
    8.9.4 Smoothing and Interpolation of Continuous Densities
    8.9.5 Implementation Issues Using CD Units
    8.9.6 Recognition Results Using CD Units
    8.9.7 Position Dependent Units
    8.9.8 Unit Splitting and Clustering
    8.9.9 Other Factors for Creating Additional Subword Units
    8.9.10 Acoustic Segment Units
    8.10 Creation of Vocabulary-lndependent Units
    8.11 Semantic Postprocessor for Recognition
    8.12 Summary
   9 TASK ORIENTED APPLICATIONS OF AUTOMATIC SPEECH
    RECOGNITION
    9.1 Introduction
    9.2 Speech-Recognizer Performance Scores
    9.3 Characteristics of Speech-Recognition Applications
    9.3.1 Methods of Handling Recognition Errors
    9.4 Broad Classes of Speech-Recognition Applications
    9.5 Command-and-Control Applications
    9.5.1 Voice Repertory Dialer
    9.5.2 Automated Call-Type Recognition
    9.5.3 Call Distribution by Voice Commands
    9.5.4 Directory Listing Retrieval
    9.5.5 Credit Card Sales Validation
    9.6 Projections for Speech Recognition
   

本目錄推薦

掃描二維碼
Copyright ? 讀書網(wǎng) m.ranfinancial.com 2005-2020, All Rights Reserved.
鄂ICP備15019699號 鄂公網(wǎng)安備 42010302001612號