I am from Finland. I finished my PhD work in the laboratory of Acoustics and Audio Signal Processing at Helsinki University of Technology (HUT) in 2001. In 2000-2001 I worked as a post-doc at Lucent Bell Laboratories and later Agere Systems (Murray Hill, NJ, USA). After an extended post-doc period at HUT I joined the Digital Signal Processing group at Philips Research, Eindhoven, The Netherlands, in 2004.
Abstract: A method to detect the distance of a speaker from a single microphone in a room environment
is proposed. Several features, related to statistical parameters of speech source excitation signals, are
introduced and are shown to depend on the distance between source and receiver. Those features are
used to train a pattern recognizer for distance detection. The method is tested using a database of speech
recordings in four rooms with different acoustical properties. Performance is shown to be independent
of the signal gain and level, but depends on the reverberation time and the characteristics of the room.
Overall, the system performs well especially for close distances and for rooms with low reverberation
time and it appears to be robust to small distance mismatches. Finally, a listening test is conducted in
order to compare the results of the proposed method to the performance of human listeners.
Abstract: Stereo audio signal is often modeled as a mixture of instantaneously mixed primary components and uncorrelated ambience components. This paper focuses on the estimation
of the primary-to-ambience energy ratio, PAR. This measure
is useful for signal decomposition in stereo and multichannel audio coding, format conversion, and spatial audio enhancement. The conventional approaches for the estimation
of the ratio are based on the ratio of eigenvalues which requires equal energies of the ambience signals. This often
leads to an inaccurate estimate of PAR. An alternative measure is proposed which reduces those estimation errors but
requires a priori information about the primary component
signal. The performance of the method is demonstrated with
synthetic signals and a large collection of stereo audio data.
Abstract: Many applications demand the automatic induction of the
tempo of a musical excerpt. The tempo estimation systems
follow a general scheme that consists of two main steps: the
creation of a feature list and the detection of periodicities
on this list. In this study, we propose a new method for the
implementation of the ï¬rst step, along with the addition of a
ï¬nal step that will enhance the tempo estimation procedure.
The proposed method for the extraction of the feature list is
based on Gammatone subspace analysis and Linear Prediction Error Filters (LPEFs). As a ï¬nal step on the system, the
application of a model that approximates the tempo perception by human listeners is proposed. The results of the evaluation indicate the proposed method compares favourably
with other, state-of-the-art tempo estimation methods, using
only one frame of the musical experts when most of the literature methods demand the processing of the whole piece