MohamedAli.Kammoun - Publications List

Journal articles

2008

Mohamed Ali KAMMOUN, Ahmed Ben HAMIDA (2008) Unit Selection Algorithm Using Bi-grams Model For Corpus-Based Speech Synthesis WASET International Journal of Signal Processing 5: 2. 120-125 November

Abstract: In this paper, we present a novel statistical approach to corpus-based speech synthesis. Classically, phonetic information is defined and considered as acoustic reference to be respected. In this way, many studies were elaborated for acoustical unit classification. This type of classification allows separating units according to their symbolic characteristics. Indeed, target cost and concatenation cost were classically defined for unit selection. In Corpus-Based Speech Synthesis System, when using large text corpora, cost functions were limited to a juxtaposition of symbolic criteria and the acoustic information of units is not exploited in the definition of the target cost. In this manuscript, we token in our consideration the unit phonetic information corresponding to acoustic information. This would be realized by defining a probabilistic linguistic Bi-grams model basically used for unit selection. The selected units would be extracted from the English TIMIT corpora.

Notes:

2006

Mohamed Ali KAMMOUN, Dorra GARGOURI, Mondher FRIKHA, Ahmed BEN HAMIDA (2006) Cepstrum vs. LPC: A Comparative Study for Speech Formant Frequencies Estimation GESTS International Transactions on Communication and Signal Processing 9: 1. 87-102 October

Abstract: This paper discusses the accurate measurement of formant frequencies using Cesptral and LPC method. Each algorithm was implemented with Matlab and was applied in the aim to evaluate the precision of both designed techniques. The conceived Cepstral algorithm is a frequency method based on picking peaks from the Cepstrally-smoothed frequency spectrum of the speech signal. Cepstral smoothing is a nonparametric method that attempts to remove the effect of glottal pulsing to obtain the spectral envelope corresponding to the vocal tract response. The obtained result, i.e. the Cepstrum, was then used to estimate the smoothed spectrum. Formant frequencies are estimated from the smoothed speech spectrum by adding constraints on the formant frequency ranges. The four highest peaks are typically classified as the first four formants. However, the LPC algorithm estimate formant frequencies from the all pole model of the vocal tract transfer function. The approach relies on the source â�� filter model supposing that the speech signal can be considered to be the output of a linear system. The frequency response of the filter has different spectral characteristics depending on the shape of the vocal tract. The spectral peaks in the spectrum are the resonances of the vocal tract and are commonly referred to as formants. The linear prediction analysis is the traditional method used to compute the model of the vocal tract. The obtained result, i.e. prediction coefficients, was then used to estimate formant frequencies. The obtained results show that there is a wide range in the estimated values of formant frequencies for male and female speakers. The presented work supply a comparison between the two techniques based on the coefficient of deviation, standard deviation and physiological results, in the aim to evaluate every method.

Notes:

Dorra GARGOURI, Mohamed Ali KAMMOUN, Ahmed BEN HAMIDA (2006) Source-Filter Models for Formants Estimation WSEAS Transactions on Signal Processing 2: 5. 618-625 May

Abstract: One of the key aspects of a speech signal is its formant structure. The formant frequencies analysis of speech signals is of great importance. Unfortunately, there isnâ��t a straight forward method that allows a good evaluation of these frequencies. In this paper, we present a comparative study of three techniques of speech analysis based on the prediction of the first three formant frequencies from linear prediction coefficients (LPC), Cepstre and linear prediction based cepstral coefficients (LPCC). These techniques are applied to four vowels extracted from the TIMIT database and pronounced by twenty different speakers. The presumed methods have been implemented in MATLAB and applied to the problem of the measurement accuracy of formant frequencies. In order to evaluate the used techniques, we perform a comparison between the formant frequencies of vowels obtained by each method and typical formant frequencies. Results showed that the Cepstral analysis gives good results for the first formant and the linear prediction based techniques are more sophisticated for the formants of highest frequencies.

Notes:

Conference papers

2008

Mohamed Ali KAMMOUN, Ahmed BEN HAMIDA (2008) SYNTHESE DE LA PAROLE PAR CORPUS : UTILISATION DU MODELE BI-GRAMS In: La cinquième Conférence Internationale d’Electrotechnique et d’Automatique Edited by:JTEA'08. 1231-1236 JTEA'08

Abstract: Cet article prÃ©sente un systÃ¨me de synthÃ¨se de la parole par corpus. Lâ��Ã©tude est consacrÃ©e Ã lâ��introduction de critÃ¨res phonÃ©tiques pour la sÃ©lection des unitÃ©s. Plus prÃ©cisÃ©ment, des informations phonÃ©tiques sont dÃ©finies et considÃ©rÃ©es comme des consignes acoustiques Ã respecter. Beaucoup dâ��Ã©tudes ont Ã©tÃ© Ã©laborÃ©es pour la classification acoustique des unitÃ©s. Ce type de classification permet de sÃ©parer les unitÃ©s acoustiquement en fonction de leurs caractÃ©ristiques symboliques. Les fonctions de coÃ»ts cible se limitent donc Ã une juxtaposition de critÃ¨res symboliques et lâ��information acoustique des unitÃ©s nâ��est pas explicitement exploitÃ©e dans la dÃ©finition du coÃ»t cible. Dans ce manuscrit, nous cherchons donc Ã prendre en compte explicitement lâ��information phonÃ©tique des unitÃ©s correspondant Ã lâ��information acoustique. Ceci sera rÃ©alisÃ© en dÃ©finissant un modÃ¨le linguistique Bi-grams. Les unitÃ©s sÃ©lectionnÃ©es seront extraites depuis le corpus TIMIT. Nous proposons Ã la fin de ce papier une mise en oeuvre dâ��un module de traitement acoustique basÃ© sur lâ��algorithme TD-PSOLA.

Notes:

2006

Dorra GARGOURI, Mohamed Ali KAMMOUN, AhmedBEN HAMIDA (2006) A Comparative Study of Formant Frequencies Estimation Techniques In: 5th WSEAS International Conference on Signal Processing Edited by:WSEAS. 15-19 WSEAS

Abstract: This paper presents two techniques of formants estimation based on LPC and cepstral analysis. These methods are implemented with Matlab and applied to the problem of accurate measurement of formant frequencies. The first algorithm estimate formant frequencies from the all pole model of the vocal tract transfer function. The approach relies on the source â�� filter model supposing that the speech signal can be considered to be the output of a linear system. The spectral peaks in the spectrum are the resonances of the vocal tract and are commonly referred to as formants. The cepstral algorithm picks formant frequencies from the smoothed spectrum. The approach relies on decomposing the speech signal by homomorphic deconvolution into two components: the first component presents the excitation, while the second component is intended to present vocal tract resonances. The result, called cepstrum, is then used to estimate the smoothed spectrum. Formant picking is achieved by localizing the spectral maxima from the envelope. Results show the efficiency of LP based technique and the limitation of the cepstral technique in the estimation of formants of high frequencies.

Notes:

2005

Mondher FRIKHA, Sajiaa BEN MASSAOUD, Mohamed Ali KAMMOUN, Dorra GARGOURI, Mongi LAHYANI, Ahmed BEN HAMIDA (2005) Optimizing Some HMM Model Parameters in an Isolated Speech Recognition System In: Third International Conference on Systems, Signals & Devices Edited by:SSD'05. SSD'05

Abstract: Hidden Markov models (HMMs) are stochastic models. They have been applied with great success in the field of speech recognition during the last three decades. It has been shown that the performance of a recognizer based on HMM modeling may be affected by a bad choice of the type of acoustic feature parameters in the acoustic front end module. For these reasons, we proposed in this paper a speech recognition system based on word-level HMMs built on the platform HTK (Hidden Markov model Toolkit Ver. 3.2) and we investigated its performance using an acoustic front end module based on Mel Frequency Cepstral Coefficients (MFCC). For better recognition rates, we tried through our experiments to modify the number of state in each HMM. Some system's recognition rates are evaluated with different kind of MFCC derived coefficients. Results showed that a best recognition rate of 99.77% is obtained with, MFCC appended with the 0Th order cepstral parameter and the first and second order regression coefficients, 1 Gaussian mixture and 6 states.

Notes:

Dorra GARGOURI, Mondher FRIKHA, Mohamed Ali KAMMOUN, Ahmed BEN HAMIDA (2005) A Comparative Study of an All Pole Speech Analysis for formant Extraction In: Third International Conference on Systems, Signals & Devices Edited by:SSD'O5. SSD'05

Abstract: The formant frequencies analysis of speech signals is indispensable for the search. Unfortunately, there is no totally effective method to allow good valuations of these frequencies. This paper presents a comparative study of two techniques of speech parameterization based on the prediction of the first three formant frequencies from linear prediction coefficients (LPC) and linear prediction based cepstral coefficients (LPCC). These techniques are applied to four vowels extracted from the TIMIT database and pronounced by twenty six different speakers. The presumed methods have been implemented in MATLAB and applied to the problem of the measurement accuracy of the formant frequencies. The presented work supply a comparison between the formant frequencies of vowels obtained by our methods and typical formant frequencies, in the aim to estimate every technique.

Notes:

Med Ali KAMMOUN, Dorra GARGOURI, Mondher FRIKHA, Ahmed BEN HAMIDA (2005) Linear Prediction Method Evaluation in Speech Formant Frequencies Estimation In: Third International Conference on Systems, Signals & Devices Edited by:SSD'05. SSD'05

Abstract: In this paper an improved method is presented to estimate the first four formant frequencies from LPC analysis. The presumed method which computes prediction coefficients has been implemented with Matlab and was applied to the problem of accurate measurement of formant frequencies. The conceived algorithm estimate formant frequencies from the all pole model of the vocal tract transfer function. The approach relies on the source â�� filter model supposing that the speech signal can be considered to be the output of a linear system. In fact, the vocal tract shape is considered as the â��filterâ�� that filters the excitation to produce the speech signal. The frequency response of the filter has different spectral characteristics depending on the shape of the vocal tract. The spectral peaks in the spectrum are the resonances of the vocal tract and are commonly referred to as formants. The linear prediction analysis is the traditional method used to compute the model of the vocal tract. The obtained result, i.e. prediction coefficients, was then used to estimate formant frequencies. Results showed that there is a narrow range in the estimated values of formant frequencies for male and female speakers. Such LP method evaluation validates the use of this technique for the accuracy estimation of formant frequencies.

Notes:

Dorra GARGOURI, Mohamed Ali KAMMOUN, Mohamed Ali ZERZRI, Ahmed BEN HAMIDA (2005) Formants Estimation Techniques for Speech Analysis In: International Conference on Machine Intelligence ACIDCA-ICMI'2005

Abstract: Measuring formant frequencies in speech signals is indispensable for the search and technically problematic. Accurate measurement of formant frequencies is important in many studies of speech perception and production. Unfortunately, there is no totally effective method to allow good valuations of these frequencies. This paper presents a comparative study of two techniques of speech parameterization for automatically estimating the lowest three formants of voiced speech. The first technique is based on Cepstral analysis and the second on linear prediction based cepstral coefficients (LPCC) and applied to the problem of the measurement accuracy of the formant frequencies. The presented work supply a comparison between the formant frequencies of vowels obtained by our methods and typical formant frequencies, in the aim to evaluate every technique.

Notes:

2004

DOI

D GARGOURI, M FRIKHA, M W LAFFET, M A KAMMOUN, BEN HAMIDA A (2004) Cepstral analysis for formants frequencies determination dedicated to speaker identification In: Industrial Technology, 2004. IEEE ICIT '04. 2004 IEEE International Conference on Edited by:IEEE ICIT'04. 1298 - 1302 IEEE ICIT'04

Abstract: In this paper, we present a technique of parameterization of the speech based on the cepstral analysis, for the extraction of the first four formants F1, F2, F3 and F4 with the aim of a biomedical application. Indeed, such analysis, supposed linear, assures the speech signal deconvolution. It allows separating the contribution of the vocal tract, i.e. the formants frequencies, and the one of the vocal cords responsible of the fundamental frequency. The technique applied to some vowels extracted from the TIMIT database, allows identifying the variations interlocutors of the formants frequencies according to the sex and of the region. Variability interlocutor is a major phenomenon in speech recognition because a speaker remains recognizable by the timbre of his voice in spite of a variation which can sometimes be significant. Results so obtained allow noticing the variability of the formants frequencies of a vowel pronounced by various speakers. So, several scenarios were tested to know: 1) a vowel pronounced by four men and four women who lived in the same region, 2) a vowel pronounced by four women of the same region, and 3) a vowel pronounced by eight men who lived in different regions.

Notes:

DOI

M A KAMMOUN, D GARGOURI, M FRIKHA, BEN HAMIDA A (2004) Cepstral method evaluation in speech formant frequencies estimation In: Industrial Technology, 2004. IEEE ICIT '04. 2004 IEEE International Conference on Edited by:IEEE ICIT'04. 1612 - 1616 IEEE ICIT'04

Abstract: This paper presents a technique for formant estimation using cepstral envelope analysis. The presumed method which computes cepstrum has been implemented with Matlab and was applied to the problem of accurate measurement of formant frequencies. The conceived algorithm picks formant frequencies from the smoothed spectrum. The approach relies on decomposing the speech signal into two components: the first component presents the excitation, while the second component is intended to present vocal tract resonances. Such procedure was then achieved by applying the homomorphic deconvolution to the treated speech signal. The obtained result, i.e the cepstrum, was then used to estimate the smoothed spectrum. Formant picking is achieved by localizing the spectral maxima from the smoothed envelope. Results showed that there is a wide range in the estimated values of formant frequencies for male and female speakers. Such cepstral method evaluation confirms the limitation of the use of this technique in the estimation of formant frequencies.

Notes: