Line spectral pairs (LSP) or line spectral frequencies (LSF) are used to represent linear prediction coefficients (LPC) for transmission over a channel.[1] LSPs have several properties (e.g. smaller sensitivity to quantization noise) that make them superior to direct quantization of LPCs. For this reason, LSPs are very useful in speech coding.
LSP representation was developed by Fumitada Itakura,[2] at Nippon Telegraph and Telephone (NTT) in 1975.[3] From 1975 to 1981, he studied problems in speech analysis and synthesis based on the LSP method.[4] In 1980, his team developed an LSP-based speech synthesizer chip. LSP is an important technology for speech synthesis and coding, and in the 1990s was adopted by almost all international speech coding standards as an essential component, contributing to the enhancement of digital speech communication over mobile channels and the internet worldwide.[3] LSPs are used in the code-excited linear prediction (CELP) algorithm, developed by Bishnu S. Atal and Manfred R. Schroeder in 1985.
Mathematical foundation
editThe LP polynomial can be expressed as , where:
By construction, P is a palindromic polynomial and Q an antipalindromic polynomial; physically P(z) corresponds to the vocal tract with the glottis closed and Q(z) with the glottis open.[5] It can be shown that:
- The roots of P and Q lie on the unit circle in the complex plane.
- The roots of P alternate with those of Q as we travel around the circle.
- As the coefficients of P and Q are real, the roots occur in conjugate pairs
The Line Spectral Pair representation of the LP polynomial consists simply of the location of the roots of P and Q (i.e. such that ). As they occur in pairs, only half of the actual roots (conventionally between 0 and ) need be transmitted. The total number of coefficients for both P and Q is therefore equal to p, the number of original LP coefficients (not counting ).
A common algorithm for finding these[6] is to evaluate the polynomial at a sequence of closely spaced points around the unit circle, observing when the result changes sign; when it does a root must lie between the points tested. Because the roots of P are interspersed with those of Q a single pass is sufficient to find the roots of both polynomials.
To convert back to LPCs, we need to evaluate by "clocking" an impulse through it N times (order of the filter), yielding the original filter, A(z).
Properties
editLine spectral pairs have several interesting and useful properties. When the roots of P(z) and Q(z) are interleaved, stability of the filter is ensured if and only if the roots are monotonically increasing. Moreover, the closer two roots are, the more resonant the filter is at the corresponding frequency. Because LSPs are not overly sensitive to quantization noise and stability is easily ensured, LSP are widely used for quantizing LPC filters. Line spectral frequencies can be interpolated.
See also
editSources
edit- Speex manual and source code (lsp.c)
- "The Computation of Line Spectral Frequencies Using Chebyshev Polynomials"/ P. Kabal and R. P. Ramachandran. IEEE Trans. Acoustics, Speech, Signal Processing, vol. 34, no. 6, pp. 1419–1426, Dec. 1986.
Includes an overview in relation to LPC.
- "Line Spectral Pairs" chapter as an online excerpt (pdf) / "Digital Signal Processing - A Computer Science Perspective" (ISBN 0-471-29546-9) Jonathan Stein.
References
edit- ^ Sahidullah, Md.; Chakroborty, Sandipan; Saha, Goutam (Jan 2010). "On the use of perceptual Line Spectral pairs Frequencies and higher-order residual moments for Speaker Identification". International Journal of Biometrics. 2 (4): 358–378. doi:10.1504/ijbm.2010.035450.
- ^ Zheng, F.; Song, Z.; Li, L.; Yu, W. (1998). "The Distance Measure for Line Spectrum Pairs Applied to Speech Recognition" (PDF). Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP'98) (3): 1123–6.
- ^ a b "List of IEEE Milestones". IEEE. Retrieved 15 July 2019.
- ^ "Fumitada Itakura Oral History". IEEE Global History Network. 20 May 2009. Retrieved 2009-07-21.
- ^ http://svr-www.eng.cam.ac.uk/~ajr/SpeechAnalysis/node51.html#SECTION000713000000000000000 Tony Robinson: Speech Analysis
- ^ e.g. lsf.c in http://www.ietf.org/rfc/rfc3951.txt