Abstract: In this paper, we present a novel statistical approach to corpus-based speech synthesis. Classically, phonetic information is defined and considered as acoustic reference to be respected. In this way, many studies were elaborated for acoustical unit classification. This type of classification allows separating units according to their symbolic characteristics. Indeed, target cost and concatenation cost were classically defined for unit selection.
In Corpus-Based Speech Synthesis System, when using large text corpora, cost functions were limited to a juxtaposition of symbolic criteria and the acoustic information of units is not exploited in the definition of the target cost.
In this manuscript, we token in our consideration the unit phonetic information corresponding to acoustic information. This would be realized by defining a probabilistic linguistic Bi-grams model basically used for unit selection. The selected units would be extracted from the English TIMIT corpora.
Abstract: This paper discusses the accurate measurement of formant frequencies using
Cesptral and LPC method. Each algorithm was implemented with Matlab and was applied in
the aim to evaluate the precision of both designed techniques.
The conceived Cepstral algorithm is a frequency method based on picking peaks from the
Cepstrally-smoothed frequency spectrum of the speech signal. Cepstral smoothing is a
nonparametric method that attempts to remove the effect of glottal pulsing to obtain the spectral
envelope corresponding to the vocal tract response. The obtained result, i.e. the Cepstrum, was
then used to estimate the smoothed spectrum. Formant frequencies are estimated from the
smoothed speech spectrum by adding constraints on the formant frequency ranges. The four
highest peaks are typically classified as the first four formants.
However, the LPC algorithm estimate formant frequencies from the all pole model of the vocal
tract transfer function. The approach relies on the source – filter model supposing that the
speech signal can be considered to be the output of a linear system. The frequency response of
the filter has different spectral characteristics depending on the shape of the vocal tract. The
spectral peaks in the spectrum are the resonances of the vocal tract and are commonly referred
to as formants. The linear prediction analysis is the traditional method used to compute the
model of the vocal tract. The obtained result, i.e. prediction coefficients, was then used to
estimate formant frequencies.
The obtained results show that there is a wide range in the estimated values of formant
frequencies for male and female speakers. The presented work supply a comparison between
the two techniques based on the coefficient of deviation, standard deviation and physiological
results, in the aim to evaluate every method.
Abstract: One of the key aspects of a speech signal is its formant structure. The formant frequencies analysis of speech
signals is of great importance. Unfortunately, there isn’t a straight forward method that allows a good
evaluation of these frequencies. In this paper, we present a comparative study of three techniques of speech
analysis based on the prediction of the first three formant frequencies from linear prediction coefficients (LPC),
Cepstre and linear prediction based cepstral coefficients (LPCC).
These techniques are applied to four vowels extracted from the TIMIT database and pronounced by twenty
different speakers. The presumed methods have been implemented in MATLAB and applied to the problem of
the measurement accuracy of formant frequencies.
In order to evaluate the used techniques, we perform a comparison between the formant frequencies of vowels
obtained by each method and typical formant frequencies. Results showed that the Cepstral analysis gives good
results for the first formant and the linear prediction based techniques are more sophisticated for the formants
of highest frequencies.
Abstract: This paper presents two techniques of formants estimation based on LPC and cepstral analysis. These methods are implemented with Matlab and applied to the problem of accurate measurement of formant frequencies.
The first algorithm estimate formant frequencies from the all pole model of the vocal tract transfer function.
The approach relies on the source – filter model supposing that the speech signal can be considered to be the
output of a linear system. The spectral peaks in the spectrum are the resonances of the vocal tract and are
commonly referred to as formants.
The cepstral algorithm picks formant frequencies from the smoothed spectrum. The approach relies on decomposing the speech signal by homomorphic deconvolution into two components: the first component
presents the excitation, while the second component is intended to present vocal tract resonances. The result,
called cepstrum, is then used to estimate the smoothed spectrum. Formant picking is achieved by localizing the
spectral maxima from the envelope.
Results show the efficiency of LP based technique and the limitation of the cepstral technique in the estimation
of formants of high frequencies.
Abstract: Hidden Markov models (HMMs) are stochastic models. They have been applied with great success in the field of speech recognition during the last three decades. It has been shown that the performance of a recognizer based on HMM modeling may be affected by a bad choice of the type of acoustic feature parameters in the acoustic front end module. For these reasons, we proposed in this paper a speech recognition system based on word-level HMMs built on the platform HTK (Hidden Markov model Toolkit Ver. 3.2) and we investigated its performance using an acoustic front end module based on Mel Frequency Cepstral Coefficients (MFCC). For better recognition rates, we tried through our experiments to modify the number of state in each HMM. Some system's recognition rates are evaluated with different kind of MFCC derived coefficients. Results showed that a best recognition rate of 99.77% is obtained with, MFCC appended with the 0Th order cepstral parameter and the first and second order regression coefficients, 1 Gaussian mixture and 6 states.
Abstract: The formant frequencies analysis of speech signals is indispensable for the search. Unfortunately, there is no totally effective method to allow good valuations of these frequencies. This paper presents a comparative study of two techniques of speech parameterization based on the prediction of the first three formant frequencies from linear prediction coefficients (LPC) and linear prediction based cepstral coefficients (LPCC).
These techniques are applied to four vowels extracted from the TIMIT database and pronounced by twenty six different speakers. The presumed methods have been implemented in MATLAB and applied to the problem of the measurement accuracy of the formant frequencies. The presented work supply a comparison between the formant frequencies of vowels obtained by our methods and typical formant frequencies, in the aim to estimate every technique.
Abstract: In this paper an improved method is presented to estimate the first four formant frequencies from LPC analysis. The presumed method which computes prediction coefficients has been implemented with Matlab and was applied to the problem of accurate measurement of formant frequencies.
The conceived algorithm estimate formant frequencies from the all pole model of the vocal tract transfer function. The approach relies on the source – filter model supposing that the speech signal can be considered to be the output of a linear system. In fact, the vocal tract shape is considered as the “filter” that filters the excitation to produce the speech signal. The frequency response of the filter has different spectral characteristics depending on the shape of the vocal tract. The spectral peaks in the spectrum are the resonances of the vocal tract and are commonly referred to as formants. The linear prediction analysis is the traditional method used to compute the model of the vocal tract. The obtained result, i.e. prediction coefficients, was then used to estimate formant frequencies.
Results showed that there is a narrow range in the estimated values of formant frequencies for male and female speakers. Such LP method evaluation validates the use of this technique for the accuracy estimation of formant frequencies.
Abstract: Measuring formant frequencies in speech signals is indispensable for the search and technically problematic. Accurate measurement of formant frequencies is important in many studies of speech perception and production. Unfortunately, there is no totally effective method to allow good valuations of these frequencies. This paper presents a comparative study of two techniques of speech parameterization for automatically estimating the lowest three formants of voiced speech. The first technique is based on Cepstral analysis and the second on linear prediction based cepstral coefficients (LPCC) and applied to the problem of the measurement accuracy of the formant frequencies.
The presented work supply a comparison between the formant frequencies of vowels obtained by our methods and typical formant frequencies, in the aim to evaluate every technique.
Abstract: In this paper, we present a technique of parameterization of the speech based on the cepstral analysis, for the extraction of the first four formants F1, F2, F3 and F4 with the aim of a biomedical application. Indeed, such analysis, supposed linear, assures the speech signal deconvolution. It allows separating the contribution of the vocal tract, i.e. the formants frequencies, and the one of the vocal cords responsible of the fundamental frequency. The technique applied to some vowels extracted from the TIMIT database, allows identifying the variations interlocutors of the formants frequencies according to the sex and of the region. Variability interlocutor is a major phenomenon in speech recognition because a speaker remains recognizable by the timbre of his voice in spite of a variation which can sometimes be significant. Results so obtained allow noticing the variability of the formants frequencies of a vowel pronounced by various speakers. So, several scenarios were tested to know: 1) a vowel pronounced by four men and four women who lived in the same region, 2) a vowel pronounced by four women of the same region, and 3) a vowel pronounced by eight men who lived in different regions.
Abstract: This paper presents a technique for formant estimation using cepstral envelope analysis. The presumed method which computes cepstrum has been implemented with Matlab and was applied to the problem of accurate measurement of formant frequencies. The conceived algorithm picks formant frequencies from the smoothed spectrum. The approach relies on decomposing the speech signal into two components: the first component presents the excitation, while the second component is intended to present vocal tract resonances. Such procedure was then achieved by applying the homomorphic deconvolution to the treated speech signal. The obtained result, i.e the cepstrum, was then used to estimate the smoothed spectrum. Formant picking is achieved by localizing the spectral maxima from the smoothed envelope. Results showed that there is a wide range in the estimated values of formant frequencies for male and female speakers. Such cepstral method evaluation confirms the limitation of the use of this technique in the estimation of formant frequencies.