|
ABSTRACT
Voice over IP (VoIP) speech quality estimation is crucial to providing optimal Quality of Service (QoS). This paper seeks to provide improved speech quality estimation models with better prediction accuracy by considering a richer set of input features than the current International Telecommunications Union-Telecommunication (ITU-T) recommendations. It addresses a transitional phase, where wideband (WB) networks are becoming available. However, they have to co-exist with the existing narrowband (NB) setups for the time being. Quality estimation becomes a challenge in such a mixed context. The ITU-T recommendation (termed E-Model) has recently been extended to deal with the mixed context. However, it evaluates the speech degradation in the WB scenario based solely on codec related distortions (only a subset of factors affecting the speech quality on a VoIP network). The extension is derived out of speech signals evaluated by human subjects: an expensive and difficult to reproduce exercise. This paper innovates by considering a number of other network distortion types as well to produce generalised models that predict the quality degradation to a higher accuracy. To this end, an extensive set of speech samples is subjected to a wide variety of distortions. The degraded signals are evaluated by the currently best available algorithmic approximation of human evaluation of speech to produce quality scores. Using the distortions as the input features and targeting the quality scores, we employ Genetic Programming to produce parsimonious models that show considerable prediction gain compared to the E-Model. As against some existing approaches, where the models are tailored to various telephony codecs, the evolved models generalise across a variety of modern codecs.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
V. Barriac, J. Y. Sout, and C. Lockwood. Discussion on unified objective methodologies for the comparison of voice quality of narrowband and wideband scenarios. In In. Proc. Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction, 2004.
|
| |
2
|
A. D. Clark. Modeling the effects of burst packet loss and recency on subjective voice quality. In 2nd IP-Telephony Workshop, Columbia University, New York, April 2001.
|
| |
3
|
ETSI EN 301 704 V7.2.1. Digital cellular telecommunications system; Adaptive Multi-Rate (AMR) speech transcoding.
|
| |
4
|
S. Gustafson, E. K. Burke, and N. Krasnogor. On improving genetic programming for symbolic regression. In D. C. et. al., editor, Proceedings of the 2005 IEEE Congress on Evolutionary Computation, volume 1, pages 912--919, Edinburgh, UK, 2-5Sept. 2005. IEEE Press.
|
| |
5
|
ITU-T. Coding of Speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear-prediction (CS-ACELP). International Telecommunications Union, Geneva, Switzerland, March 1996. ITU-T Recommendation G.729.
|
| |
6
|
ITU-T. Dual rate speech coder for multimedia communication transmitting at 5.3 and 6.3 kbit/s. International Telecommunications Union, Geneva, Switzerland, March 1996. ITU-T Recommendation G.723.1.
|
| |
7
|
ITU-T. Methods for subjective determination of transmission quality. International Telecommunications Union, Geneva, Switzerland, 1996. ITU-T Recommendation P.800.
|
| |
8
|
ITU-T. coded-speech database. International Telecommunications Union, Geneva, Switzerland, 1998. ITU-T P.Supplement 23.
|
| |
9
|
ITU-T. Methodology for the derivation of equipment impairment factors from instrumental models. International Telecommunications Union, Geneva, Switzerland, 2002. ITU-T Recommendation P.834.
|
| |
10
|
ITU-T. Mean opinion score (MOS) terminology. International Telecommunications Union, Geneva, Switzerland, 2003. ITU-T Recommendation P.800.1.
|
| |
11
|
ITU-T. Wideband coding of speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB). International Telecommunications Union, Geneva, Switzerland, July 2003. ITU-T Recommendation G.722.2.
|
| |
12
|
ITU-T. The E-Model, a computational model for use in transmission planning. International Telecommunications Union, Geneva, Switzerland, 2005. ITU-T Recommendation G.107.
|
| |
13
|
ITU-T. Low-complexity coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss. International Telecommunications Union, Geneva, Switzerland, May 2005. ITU-T Recommendation G.722.1.
|
| |
14
|
ITU-T. Network model for evaluating multimedia transmission performance over internet protocol. International Telecommunications Union, Geneva, Switzerland, November 2005. ITU-T Recommendation G.1050.
|
| |
15
|
ITU-T. Wideband extension to Recommendation P.862 for the assessment of wideband telephone networks and speech codecs. International Telecommunications Union, Geneva, Switzerland, 2005. ITU-T Recommendation P.862.2.
|
| |
16
|
W. Jiang and H. Schulzrinne. Modeling of packet loss and delay and their effect on real-time multimedia service quality. In In Proc. NOSSDAV, June 2000.
|
| |
17
|
M. Keijzer. Improving symbolic regression with interval arithmetic and linear scaling. In C. Ryan, T. Soule, M. Keijzer, E. Tsang, R. Poli, and E. Costa, editors, Genetic Programming, Proceedings of EuroGP'2003, volume 2610 of LNCS, pages 70--82, Essex, 14-16 Apr. 2003. Springer-Verlag.
|
| |
18
|
|
| |
19
|
|
| |
20
|
Lingfen and E. C. Ifeachor. perceived speech quality prediction for voice over ip-based networks. In IEEE International Conference on Communications (ICC), volume 4, pages 2573--2577, 2002.
|
| |
21
|
|
| |
22
|
S. Moller, A. Raake, N. Kitawaki, A. Takahashi, and M. Waltermann. Impairment factor framework for wide-band speech codecs. IEEE Transactions on Audio, Speech and Language Processing, 16(6):1969--1976, November 2006.
|
| |
23
|
C. Morioka, A. Kurashima, and A. Takahashi. Proposal on objective speech quality assessment for wideband telephony. In IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), 2004.
|
| |
24
|
S. Pennock. Accuracy of the perceptual evaluation of speech quality (PESQ) algorithm. In Measurement of Speech and Audio Quality in Networks (MESAQIN), January 2002.
|
| |
25
|
|
 |
26
|
Adil Raja , R. Muhammad Atif Azad , Colin Flanagan , Dorel Picovici , Conor Ryan, Non-intrusive quality evaluation of VoIP using genetic programming, Proceedings of the 1st international conference on Bio inspired models of network, information and computing systems, December 11-13, 2006, Cavalese, Italy
[doi> 10.1145/1315843.1315881]
|
| |
27
|
A. Raja, R. M. A. Azad, C. Flanagan, and C. Ryan. Real-time, non-intrusive evaluation of VoIP. In M. Ebner, M. O'Neill, A. Ekárt, L. Vanneschi, and A. I. Esparcia-Alcázar, editors, Proceedings of the 10th European Conference on Genetic Programming, volume 4445 of Lecture Notes in Computer Science, pages 217--228, Valencia, Spain, 11 - 13Apr. 2007. Springer.
|
| |
28
|
H. Sanneck and G. Carle. A framework model for packet loss metrics based on loss runlengths. In SPIE/ACM SIGMM Multimedia Computing and Networking Conference, January 2000.
|
| |
29
|
L. Sun and E. C. Ifeachor. Subjective and objective speech quality evaluation under bursty losses. In Measurement of Speech and Audio Quality in Networks (MESAQIN), January 2002.
|
| |
30
|
L. Sun and E. C. Ifeachor. Voice quality prediction models and their application in VoIP networks. IEEE Transactions on Multimedia, 8(4):809--820, August 2006.
|
INDEX TERMS
Primary Classification:
I.
Computing Methodologies
I.2
ARTIFICIAL INTELLIGENCE
I.2.2
Automatic Programming
Subjects:
Program synthesis
Additional Classification:
I.
Computing Methodologies
I.2
ARTIFICIAL INTELLIGENCE
I.2.6
Learning
Subjects:
Induction
I.5
PATTERN RECOGNITION
I.5.1
Models
Subjects:
Structural
General Terms:
Algorithms,
Experimentation,
Measurement,
Performance,
Reliability,
Standardization
Keywords:
Ie,
WB,
eff,
E-Model,
PESQ-WB,
VoIP,
genetic programming,
speech quality,
symbolic regression
|