Modern Speech Communications
Principles and Applications for Fixed and Wireless Channels

Preliminary Contents

Part I: Transmission Issues

1      The Propagation Environment 5
1.1      Introduction to Communications Issues 5
1.2      AWGN Channel 7
1.2.1      Background 7
1.2.2      Practical Gaussian Channels 8
1.2.3      Gaussian Noise 9
1.2.4      Shannon-hartley Law 11
1.3      The Cellular Concept 12
1.4      Radio Wave Propagation 16
1.4.1      Background 16
1.4.2      Narrow-band fading Channels 19
1.4.3      Propagation Pathloss Law 20
1.4.4      Slow Fading Statistics 24
1.4.4.1      Fast Fading Statistics 25
1.4.4.2      Doppler Spectrum 30
1.4.4.3      Simulation of Narrowband Channels 32
1.4.4.3.1      Frequency-domain fading simulation 34
1.4.4.3.2      Time-domain fading simulation 35
1.4.4.3.3      Box-M\"uller Algorithm of AWGN generation 35
1.4.5      Wideband Channels 36
1.4.5.1      Modelling of Wideband Channels 36
1.5      Shannon's Message for Wireless Channels 42
2      Modulation and Transmission 47
2.1      The Wireless Communications Scene 47
2.2      Modulation Issues 49
2.2.1      Choice of Modulation 49
2.2.2      Quadrature Amplitude Modulation 52
2.2.2.1      Background 52
2.2.2.2      Modem Schematic 53
2.2.2.2.1      Gray Mapping and Phasor Constellation 53
2.2.2.2.2      Nyquist Filtering 56
2.2.2.2.3      Modulation and demodulation 59
2.2.2.2.4      Data recovery 61
2.2.2.3      QAM Constellations 62
2.2.2.4      16QAM BER versus SNR Performance over AWGN Channels 65
2.2.2.4.1      Decision Theory 65
2.2.2.4.2      QAM modulation and transmission 69
2.2.2.4.3      16-QAM Demodulation in AWGN 69
2.2.2.5      Reference Assisted Coherent QAM for Fading Channels 73
2.2.2.5.1      PSAM System Description 73
2.2.2.5.2      Channel Gain Estimation in PSAM 76
2.2.2.5.3      PSAM performance 78
2.2.2.6      Differentially detected QAM 81
2.2.3      Adaptive Modulation 84
2.2.3.1      Background to Adaptive Modulation 84
2.2.3.2      Optimisation of Adaptive Modems 88
2.2.3.3      Adaptive Modulation Performance 91
2.2.3.4      Equalisation Techniques 92
2.2.4      Orthogonal Frequency Division Modulation 93
2.3      Packet Reservation Multiple Access 96
2.4      Flexible Transceiver Architecture 100
3      Convolutional Channel Coding 103
3.1      Brief Channel Coding History 103
3.2      Convolutional Encoding 104
3.3      State and Trellis Transitions 107
3.4      The Viterbi Algorithm 110
3.4.1      Error-free hard-decision Viterbi decoding 111
3.4.2      Erroneous hard-decision Viterbi decoding 115
3.4.3      Error-free soft-decision Viterbi decoding 116
4      Block-based Channel Coding 121
4.1      Introduction 121
4.2      Finite Fields 122
4.2.1      Definitions 122
4.2.2      Galois Field Construction 127
4.2.3      Galois Field Arithmetic 128
4.3      RS and BCH Codes 131
4.3.1      Definitions 131
4.3.2      RS Encoding 132
4.3.3      RS Encoding Example 135
4.3.4      Circuits for Cyclic Encoders 139
4.3.4.1      Polynomial Multiplication 139
4.3.4.2      Shift Register Encoding Example 140
4.3.5      RS Decoding 143
4.3.5.1      Formulation of the Key-Equations 143
4.3.5.2      Peterson-Gorenstein-Zierler Decoder 150
4.3.5.3      PGZ Decoding Example 152
4.3.5.4      Berlekamp-Massey algorithm 158
4.3.5.5      Berlekamp-Massey Decoding Example 166
4.3.5.6      Forney Algorithm 170
4.3.5.7      Forney Algorithm Example 175
4.3.5.8      Error Evaluator Polynomial Computation 177
4.4      RS and BCH Codec Performance 180
4.5      Summary and Conclusions 184

Part II: Speech Signals and Waveform Coding

5      Speech Signals and Coding 189
5.1      Motivation 189
5.2      Basic Characterisation of Speech Signals 190
5.3      Classification of Speech Codecs 195
5.3.1      Waveform Coding 196
5.3.1.1      Time-domain waveform Coding 196
5.3.1.2      Frequency-domain Waveform Coding 197
5.3.2      Vocoders 197
5.3.3      Hybrid Coding 199
5.4      Waveform Coding 199
5.4.1      Digitisation of Speech 199
5.4.2      Quantisation Characteristics 201
5.4.3      Quantisation Noise and Rate-Distortion Theory 203
5.4.4      Non-uniform Quantisation: Companding 206
5.4.5      Logarithmic Compression 208
5.4.5.1      $\mu $-Law Compander 210
5.4.5.2      A-law Companding 212
5.4.6      Optimum Non-uniform Quantisation 213
6      Predictive Coding 221
6.1      Forward Predictive Coding 221
6.2      DPCM Codec Schematic 222
6.3      Predictor Design 224
6.3.1      Problem Formulation 224
6.3.2      Covariance Coefficient Computation 226
6.3.3      Predictor Coefficient Computation 228
6.4      Adaptive One-word-memory Quantization 233
6.5      DPCM Performance 236
6.6      Backward-Adaptive Prediction 238
6.6.1      Background 238
6.6.2      Stochastic model processes 240
6.7      The 32 kbps G 721 ADPCM Codec 244
6.7.1      Functional Description 244
6.7.2      Adaptive Quantiser 246
6.7.3      Quantiser scale factor adaptation 246
6.7.4      Adaptation speed control 248
6.7.5      Adaptive prediction and signal reconstruction 249
6.8      Speech Quality 251
6.9      G726 and G.727 ADPCM Coding 253
6.9.1      Motivation 253
6.9.2      Embedded ADPCM coding 254
6.9.3      Performance of the Embedded G.727 ADPCM Codec 256
6.10      Rate-Distortion in Predictive Coding 263

Part III: Analysis by Synthesis Coding

7      Analysis-by-synthesis Principles 273
7.1      Motivation 273
7.2      Analysis-by-synthesis codec structure 274
7.3      The Short-term Synthesis Filter 276
7.4      Long-Term Prediction 280
7.4.1      Open-loop Optimisation of LTP parameters 280
7.4.2      Closed-loop Optimisation of LTP parameters 287
7.5      Excitation Models 292
7.6      Adaptive Postfiltering 295
7.7      Lattice-based Linear Prediction 299
8      Speech Spectral Quantization 307
8.1      Log-area ratios 307
8.2      Line Spectral Frequencies 311
8.2.1      Derivation of Line Spectral Frequencies 311
8.2.2      Determination of Line Spectral Frequencies 316
8.2.3      Chebyshev-description of Line Spectral Frequencies 318
8.3      Spectral Vector Quantization 325
8.3.1      Background 325
8.3.2      Speaker-adaptive Vector Quantisation of LSFs 325
8.3.3      Stochastic VQ of LPC Parameters 328
8.3.3.1      Background 328
8.3.3.2      The Stochastic VQ Algorithm 330
8.3.4      Robust Vector Quantisation Schemes for LSFs 333
8.3.5      LSF Vector-quantisers used in standard codecs 336
9      RPE Coding 339
9.1      Theoretical Background 339
9.2      The RPE-LTP GSM Speech encoder 348
9.2.1      Pre-processing 348
9.2.2      STP analysis filtering 350
9.2.3      LTP analysis filtering 351
9.2.4      Regular Excitation Pulse Computation 352
9.3      The RPE-LTP Speech Decoder 353
9.4      Bit-sensitivity of the GSM Codec 358
9.5      A Tool-box Based Speech Transceiver 359
10      Forward-Adaptive CELP Coding 363
10.1      Background 363
10.2      The Original CELP Approach 365
10.3      Fixed Codebook Search 369
10.4      CELP Excitation Models 371
10.4.1      Binary Pulse Excitation 371
10.4.2      Transformed Binary Pulse Excitation 373
10.4.2.1      Excitation Generation 373
10.4.2.2      TBPE Bit Sensitivity 375
10.4.3      Dual-rate Algebraic CELP Coding 379
10.4.3.1      ACELP Codebook Structure 379
10.4.3.2      Dual-rate ACELP Bitallocation 381
10.4.3.3      Dual-rate ACELP Codec Performance 382
10.5      CELP Optimization 383
10.5.1      Introduction 383
10.5.2      Calculation of the Excitation Parameters 385
10.5.2.1      Full Codebook Search Theory 385
10.5.2.2      Sequential Search Procedure 388
10.5.2.3      Full Search Procedure 389
10.5.2.4      Sub-Optimal Search Procedures 390
10.5.2.5      Quantization of the Codebook Gains 393
10.5.3      Calculation of the Synthesis Filter Parameters 396
10.5.3.1      Bandwidth Expansion 396
10.5.3.2      Least Squares Techniques 397
10.5.3.3      Optimization via Powell's Method 401
10.5.3.4      Simulated Annealing and the Effects of Quantization 403
10.6      CELP Error-sensitivity 407
10.6.1      Introduction 407
10.6.2      Improving the Spectral Information Error Sensitivity 408
10.6.2.1      LSF Ordering Policies 408
10.6.2.2      The Effect of FEC on the Spectral Parameters 411
10.6.2.3      The Effect of Interpolation 412
10.6.3      Improving the Error Sensitivity of the Excitation Parameters 414
10.6.3.1      The Fixed Codebook Index 414
10.6.3.2      The Fixed Codebook Gain 415
10.6.3.3      Adaptive Codebook Delay 416
10.6.3.4      Adaptive Codebook Gain 417
10.6.4      Matching Channel Coders to the Speech Coder 417
10.6.5      Error Resilience Conclusions 422
10.7      Dual-mode Speech Transceiver 423
10.7.1      The Transceiver Scheme 423
10.7.2      Re-configurable Modulation 424
10.7.3      Source-matched Error Protection 427
10.7.3.1      Low-quality 3.1 kBd Mode 427
10.7.3.2      High-quality 3.1 kBd Mode 432
10.7.4      Packet Reservation Multiple Access 434
10.7.5      3.1 kBd System Performance 437
10.7.6      3.1 kBd System Summary 441
10.8      Multi-slot PRMA Transceiver 442
10.8.1      Background and Motivation 442
10.8.2      PRMA-assisted Multi-slot Adaptive Modulation 443
10.8.3      Adaptive GSM-like Schemes 445
10.8.4      Adaptive DECT-like Schemes 447
10.8.5      Summary of Adaptive Multi-slot PRMA 448
11      Standard CELP Codecs 451
11.1      Background 451
11.2      The US DoD FS 1016 4.8 kbits/s CELP Codec 452
11.2.1      Introduction 452
11.2.2      LPC Analysis and Quantization 453
11.2.3      The Adaptive Codebook 455
11.2.4      The Fixed Codebook 456
11.2.5      Error Concealment Techniques 458
11.2.6      Decoder Post-Filtering 459
11.2.7      Conclusion 459
11.3      The IS-54 DAMPS speech codec 460
11.4      The JDC speech codec 464
11.5      The Qualcomm Variable Rate CELP Codec 468
11.5.1      Introduction 468
11.5.2      Codec Schematic and Bit Allocation 469
11.5.3      Codec Rate Selection 470
11.5.4      LPC Analysis and Quantization 471
11.5.5      The Pitch Filter 473
11.5.6      The Fixed Codebook 474
11.5.7      Rate 1/8 Filter Excitation 475
11.5.8      Decoder Post-Filtering 476
11.5.9      Error Protection and Concealment Techniques 477
11.5.10      Conclusion 478
11.6      Japanese Half-Rate Speech Codec 478
11.6.1      Introduction 478
11.6.2      Codec Schematic and Bit Allocation 479
11.6.3      Encoder Pre Processing 482
11.6.4      LPC Analysis and Quantization 482
11.6.5      The Weighting Filter 484
11.6.6      Excitation Vector 1 484
11.6.7      Excitation Vector 2 485
11.6.8      Quantization of the Gains 489
11.6.9      Channel Coding 490
11.6.10      Decoder Post Processing 491
11.7      The half-rate GSM codec 493
11.7.1      Half-rate GSM codec outline 493
11.7.2      Half-rate GSM codec spectral quantisation 496
11.7.3      Error protection 497
11.8      The 8 kbits/s G.729 Codec 499
11.8.1      Introduction 499
11.8.2      Codec Schematic and Bit Allocation 499
11.8.3      Encoder Pre-Processing 501
11.8.4      LPC Analysis and Quantization 501
11.8.5      The Weighting Filter 505
11.8.6      The Adaptive Codebook 506
11.8.7      The Fixed Algebraic Codebook 507
11.8.8      Quantization of the Gains 511
11.8.9      Decoder Post Processing 513
11.8.10      G.729 Error Concealment Techniques 515
11.8.11      G.729 Bit-sensitivity 517
11.8.12      Turbo-coded OFDM G.729 Speech Transceiver 518
11.8.12.1      Background 518
11.8.12.2      System Overview 519
11.8.12.3      Turbo Channel Encoding 519
11.8.12.4      OFDM in the FRAMES Speech/Data Sub--Burst 521
11.8.12.5      Channel model 522
11.8.12.6      Turbo-coded G.729 OFDM Parameters 523
11.8.12.7      Turbo-coded G.729 OFDM Performance 524
11.8.12.8      Turbo-coded G.729 OFDM Summary 525
11.8.13      G.729 Summary 527
11.9      The Reduced Complexity G.729 Annex A Codec 527
11.9.1      Introduction 527
11.9.2      The Perceptual Weighting Filter 528
11.9.3      The Open Loop Pitch Search 529
11.9.4      The Closed Loop Pitch Search 529
11.9.5      The Algebraic Codebook Search 530
11.9.6      The Decoder Post Processing 531
11.9.7      Conclusions 531
11.10      The Enhanced GSM codec 532
11.10.1      Codec Outline 532
11.10.2      Operation of the EFR-GSM Encoder 534
11.10.2.1      Spectral quantisation in the EFR-GSM Codec 534
11.10.2.2      Adaptive Codebook Search 537
11.10.2.3      Fixed Codebook Search 538
11.11      The IS-136 speech codec 539
11.11.1      IS-136 codec outline 539
11.11.2      IS-136 Bitallocation scheme 541
11.11.3      Fixed Codebook Search 543
11.11.4      IS-136 channel coding 544
11.12      The ITU G.723.1 dual rate codec 545
11.12.1      Introduction 545
11.12.2      G.723.1 Encoding Principle 546
11.12.3      Vector-Quantisation of the LSPs 548
11.12.4      Formant-based Weighting Filter 550
11.12.5      The 6.3 kbps high-rate G.723.1 excitation 550
11.12.6      The 5.3 kbps low-rate G.723.1 excitation 553
11.12.7      G.723.1 Bitallocation 554
11.12.8      G.723.1 Error Sensitivity 556
11.13      Summary of Standard CELP-based Codecs 559
12      Backward-Adaptive CELP Coding 563
12.1      Introduction 563
12.2      Motivation and Background 564
12.3      Backward-Adaptive G728 Schematic 568
12.4      Backward-Adaptive G728 Coding 571
12.4.1      Error Weighting 571
12.4.2      Windowing 571
12.4.3      Codebook Gain Adaption 576
12.4.4      Codebook Search 580
12.4.5      Excitation Vector Quantization 583
12.4.6      Adaptive Postfiltering 585
12.4.6.1      Adaptive Long-term Postfiltering 586
12.4.6.2      Adaptive Short-term Postfiltering 588
12.4.7      Complexity and Performance of the G728 Codec 589
12.5      Reduced-Rate 16-8 kbps G728-Like Codec I 590
12.6      The Effects of Long Term Prediction 594
12.7      Closed-Loop Codebook Training 601
12.8      Reduced-Rate 16-8 kbps G728-Like Codec II 607
12.9      Programmable-Rate 8-4 kbps CELP Codecs 609
12.9.1      Motivation 609
12.9.2      Improvements Due to Increasing Codebook Sizes 609
12.9.3      Forward Adaption of the Short Term Synthesis Filter 610
12.9.4      Forward Adaption of the Long Term Predictor 613
12.9.4.1      Initial Experiments 613
12.9.4.2      Quantization of Jointly Optimized Gains 615
12.9.4.3      Voiced/Unvoiced Switched Codebooks 619
12.9.5      Low Delay Codecs at 4-8 kbits/s 621
12.9.6      Low Delay ACELP Codeca 625
12.10      Error Sensitivity Issues 629
12.10.1      The Error Sensitivity of the G728 Codec 630
12.10.2      The Error Sensitivity of Our 4-8 kbits/s Low Delay Codecs 632
12.10.3      The Error Sensitivity of Our Low Delay ACELP Codec 638
12.11      A Low-Delay Multimode Speech Transceiver 638
12.12      Background 638
12.12.1      8-16 kbps Codec Performance 640
12.12.2      Transmission Issues 642
12.12.2.1      Higher-quality Mode 642
12.12.3      Lower-quality Mode 644
12.13      Speech Transceiver Performance 644
12.14      Conclusion 645

Part IV: Wideband and Very Low-Rate Coding

13      Wideband Speech Coding 649
13.1      Subband-ADPCM Wideband Coding 649
13.1.1      Introduction and Specifications 649
13.1.2      G722 Codec Outline 650
13.1.3      Principles of Subband Coding 654
13.1.4      Quadrature Mirror Filtering 656
13.1.4.1      Analysis Filtering 656
13.1.4.2      Synthesis Filtering 660
13.1.4.3      Practical QMF Design Constraints 661
13.1.5      G722 Adaptive Quantisation and Prediction 668
13.1.6      G722 Coding Performance 670
13.2      Wideband Transform-Coding at 32 kbps 671
13.2.1      Background 671
13.2.2      Transform-Coding Algorithm 671
13.3      Subband-Split Wideband CELP Codecs 675
13.3.1      Background 675
13.3.2      Subband-based Wideband CELP coding 676
13.3.2.1      Motivation 676
13.3.2.2      Low-band Coding 678
13.3.2.3      Highband Coding 678
13.3.2.4      Bit allocation Scheme 679
13.4      Fullband Wideband ACELP Coding 680
13.4.1      Wideband ACELP Excitation 680
13.4.2      Wideband 32 kbps ACELP Coding 684
13.4.3      Wideband 9.6 kbps ACELP Coding 685
13.5      Conclusions 687
14      Introduction to Very Low Rate Speech Coding 691
14.1      Sub-4.8 kbps Coding Techniques 691
14.1.1      Analysis-by-Synthesis Coding 693
14.1.2      Speech Coding at 2400bps 696
14.1.2.1      Background to 2400bps Speech Coding 696
14.1.2.2      Frequency Selective Harmonic Coder 699
14.1.2.3      Sinusoidal Transform Coder 700
14.1.2.4      Multiband Excitation Coders 701
14.1.2.5      Sub-band Linear Prediction Coder 703
14.1.2.6      Mixed Excitation Linear Prediction Coder 705
14.1.2.7      Waveform Interpolation Coder 706
14.1.3      Speech Coding Below 2400bps 708
14.2      Linear Predictive Coding model 712
14.2.1      Short Term Prediction 712
14.2.2      Long Term Prediction 714
14.2.3      Final Analysis-by-Synthesis Model 715
14.3      Speech Quality Measurements 716
14.3.1      Objective Speech Quality Measures 717
14.3.2      Subjective Speech Quality Measures 717
14.3.3      2400bps Selection Process 718
14.4      Speech Database 721
15      Linear Predictive Vocoder 723
15.1      Overview of an Linear Predictive Vocoder 723
15.2      Line Spectrum Frequencies Quantization 724
15.2.1      Line Spectrum Frequencies Scalar Quantization 725
15.2.2      Line Spectrum Frequencies Vector Quantization 726
15.3      Pitch Detection 730
15.3.1      Voiced-Unvoiced Decision 733
15.3.2      Oversampled Pitch Detector 735
15.3.3      Pitch Tracking 738
15.3.3.1      Computational Complexity 743
15.3.4      Integer Pitch Detector 744
15.4      Unvoiced Frames 745
15.5      Voiced Frames 747
15.5.1      Placement of Pulses 747
15.5.2      Pulse Energy 747
15.6      Adaptive Post Filter 748
15.7      Results for Linear Predictive Vocoder 751
16      Wavelets 759
16.1      Conceptual Introduction to Wavelets 759
16.1.1      Fourier Theory 759
16.1.2      Wavelet Theory 761
16.1.3      Detecting Discontinuities 762
16.2      Introduction to Wavelet Mathematics 764
16.2.1      Multiresolution Analysis 765
16.2.2      Polynomial Spline Wavelets 766
16.2.3      Pyramidal Algorithm 767
16.2.4      Boundary Effects 769
16.3      Preprocessing the Wavelet Transform Signal 771
16.3.1      Spurious Pulses 771
16.3.2      Normalization 774
16.3.3      Candidate Glottal Pulses 774
16.4      Voiced-Unvoiced Decision 774
16.5      Wavelet Based Pitch Detector 777
16.5.1      Dynamic Programming 778
16.5.2      Autocorrelation Simplification 782
17      Zinc Function Excitation 787
17.1      Overview of Interpolated Zinc Function Prototype Excitation 787
17.1.1      Coding Scenarios 787
17.1.1.1      U-U-U Encoder Scenario 788
17.1.1.2      U-U-V Encoder Scenario 788
17.1.1.3      V-U-U Encoder Scenario 788
17.1.1.4      U-V-U Encoder Scenario 791
17.1.1.5      V-V-V Encoder Scenario 791
17.1.1.6      V-U-V Encoder Scenario 791
17.1.1.7      U-V-V Encoder Scenario 791
17.1.1.8      V-V-U Encoder Scenario 792
17.1.1.9      U-V Decoder Scenario 793
17.1.1.10      U-U Decoder Scenario 793
17.1.1.11      V-U Decoder Scenario 793
17.1.1.12      V-V Decoder Scenario 793
17.2      Zinc Function Modelling 794
17.2.1      Error Minimization 794
17.2.2      Computational Complexity 795
17.2.3      Phases of the Zinc Functions 797
17.3      Pitch Detection 797
17.3.1      Voiced-Unvoiced Boundaries 798
17.3.2      Pitch Prototype Selection 798
17.4      Voiced Speech 800
17.4.1      Energy Scaling 803
17.4.2      Quantization 804
17.5      Interpolation 806
17.5.1      Amplitude Parameter Interpolation 808
17.5.2      Position Parameter Interpolation 808
17.5.3      Removal of Position Parameter Interpolation 809
17.5.4      Line Spectrum Frequencies Pitch Synchronous Interpolation 810
17.5.5      Interpolation Example 811
17.6      Unvoiced Speech 811
17.7      Adaptive Post Filter 814
17.8      Results for Interpolated Zinc Function Prototype Excitation Coder 814
18      Mixed Multiband Excitation 821
18.1      Overview of Mixed Multiband Excitation 821
18.2      Finite Impulse Response Filter 826
18.2.1      Computational Complexity 828
18.3      Mixed Multiband Excitation Encoder 828
18.3.1      Voicing Strengths 830
18.4      Mixed Multiband Excitation Decoder 833
18.5      Adaptive Post Filter 837
18.6      Results for Mixed Multiband Excitation Coder 837
18.6.1      Results for a Mixed Multiband Excitation and Linear Predictive Coder 837
18.6.2      Results for a Mixed Multiband Excitation and Zinc Function Prototype Excitation Coder 845
19      Comparison of Speech Transceivers 851
19.1      Background to Speech Quality Evaluation 851
19.2      Objective Speech Quality Measures 852
19.2.1      Introduction 852
19.2.2      Signal to Noise Ratios 853
19.2.3      Articulation Index 854
19.2.4      Ceptral Distance 855
19.2.5      Cepstral Example 858
19.2.6      Logarithmic likelihood ratio 861
19.2.7      Euclidean Distance 862
19.3      Subjective Measures 862
19.3.1      Quality Tests 863
19.4      Comparison of Quality Measures 864
19.4.1      Background 864
19.4.2      Intelligibility tests 865
19.5      Subjective Speech Quality of Various Codecs 867
19.6      Speech Codec Bit-sensitivity 869
19.7      Transceiver Speech Performance 873
20      Zinc Function Excitation 879