Aki Härmä

Abstract: A method to detect the distance of a speaker from a single microphone in a room environment is proposed. Several features, related to statistical parameters of speech source excitation signals, are introduced and are shown to depend on the distance between source and receiver. Those features are used to train a pattern recognizer for distance detection. The method is tested using a database of speech recordings in four rooms with different acoustical properties. Performance is shown to be independent of the signal gain and level, but depends on the reverberation time and the characteristics of the room. Overall, the system performs well especially for close distances and for rooms with low reverberation time and it appears to be robust to small distance mismatches. Finally, a listening test is conducted in order to compare the results of the proposed method to the performance of human listeners.

Abstract: Stereo audio signal is often modeled as a mixture of instantaneously mixed primary components and uncorrelated ambience components. This paper focuses on the estimation of the primary-to-ambience energy ratio, PAR. This measure is useful for signal decomposition in stereo and multichannel audio coding, format conversion, and spatial audio enhancement. The conventional approaches for the estimation of the ratio are based on the ratio of eigenvalues which requires equal energies of the ambience signals. This often leads to an inaccurate estimate of PAR. An alternative measure is proposed which reduces those estimation errors but requires a priori information about the primary component signal. The performance of the method is demonstrated with synthetic signals and a large collection of stereo audio data.

Georgina Tryfou, Aki Härmä, Athanasios Mouchtaris (2011) TEMPO ESTIMATION BASED ON LINEAR PREDICTION AND PERCEPTUAL MODELLING In: 12th International Society for Music Information Retrieval Conference (ISMIR 2011) Miami, FL, USA:

Abstract: Many applications demand the automatic induction of the tempo of a musical excerpt. The tempo estimation systems follow a general scheme that consists of two main steps: the creation of a feature list and the detection of periodicities on this list. In this study, we propose a new method for the implementation of the ï¬�rst step, along with the addition of a ï¬�nal step that will enhance the tempo estimation procedure. The proposed method for the extraction of the feature list is based on Gammatone subspace analysis and Linear Prediction Error Filters (LPEFs). As a ï¬�nal step on the system, the application of a model that approximates the tempo perception by human listeners is proposed. The results of the evaluation indicate the proposed method compares favourably with other, state-of-the-art tempo estimation methods, using only one frame of the musical experts when most of the literature methods demand the processing of the whole piece

Notes:


Journal articles

2011	Eleftheria Georganti, Tobias May, Steven van de Par, Aki Härmä, John Mourjopoulos (2011) Speaker Distance Detection using a Single Microphone IEEE Trans. Audio, Speech and Language Processing Abstract: A method to detect the distance of a speaker from a single microphone in a room environment is proposed. Several features, related to statistical parameters of speech source excitation signals, are introduced and are shown to depend on the distance between source and receiver. Those features are used to train a pattern recognizer for distance detection. The method is tested using a database of speech recordings in four rooms with different acoustical properties. Performance is shown to be independent of the signal gain and level, but depends on the reverberation time and the characteristics of the room. Overall, the system performs well especially for close distances and for rooms with low reverberation time and it appears to be robust to small distance mismatches. Finally, a listening test is conducted in order to compare the results of the proposed method to the performance of human listeners. Notes: Accepted for publication
	Aki Härmä (2011) Classification of time-frequency regions in stereo audio Journal of the Audio Engineering Society 59: 10. October Abstract: Notes:
2006	P Somervuo, A Härmä, S Fagerlund (2006) Parametric representations of bird sounds for automatic species recognition IEEE Trans. Speech Audio Processing 14: 6. 2252 - 2263 Abstract: Notes:
2004	A Härmä, J Jakka, M Tikander, M Karjalainen, T Lokki, J Hiipakka, G Lorho (2004) Augmented reality audio for mobile and wearable appliances J. Audio Engineering Society 52: 6. 618-639 June Abstract: Notes:
2002	A Härmä, M Juntunen (2002) A Method for Parametrization of Time-Varying Sounds IEEE Signal Processing Letters 9: 5. 151-153 May Abstract: Notes:
2001	A Härmä, U K Laine (2001) A comparison of warped and conventional linear predictive coding IEEE Trans. Speech Audio Processing 9: 5. 579-588 July Abstract: Notes:
	A Härmä (2001) Linear predictive coding with modified filter structures IEEE Trans. Speech Audio Processing 9: 8. 769-777 November Abstract: Notes:
2000	A Härmä, M Karjalainen, L Savioja, V Välimäki, U K Laine, J Huopaniemi (2000) Frequency-warped signal processing for audio applications J. Audio Eng. Soc. 48: 11. 1011-1031 November Abstract: Notes:
	A Härmä (2000) Implementation of frequency-warped recursive filters Signal Processing 80: 3. 543-548 February Abstract: Notes:
Book chapters

2009	A Härmä (2009) Ambient human-to-human communication In: Handbook of Ambient Intelligence and Smart Environments Edited by:Hideyuki Nakashima, Hamid Aghajan and Juan Carlos Augusto. 795-823 Springer Abstract: Notes:
1998	A Härmä (1998) Fraktaalit, Kaaos ja äänisignaalit In: Akustiikan ja äänenkäsittelytekniikan laboratorion raportti Edited by:V Välimäki. TKK/Akustiikka Abstract: Notes:
	A Härmä (1998) Äänisignaalien säröt ja vääristymät : tekniset mitat ja psykoakustiikka In: Akustiikan ja äänenkäsittelytekniikan laboratorion raportti Edited by:V Välimäki. TKK/Akustiikka Abstract: Notes:
1997	A Härmä (1997) Sisäkorvan simpukan laskennallinen mallintaminen In: Akustiikan laskennallinen mallintaminen – Akustiikan ja äänenkäsittelytekniikan laboratorion raportti Edited by:M Karjalainen. TKK/Akustiikka Abstract: Notes:
1996	A Härmä (1996) Kuulon taajuusresoluutio ja sen mallintaminen In: Digitaaliaudion signaalinkäsittelymenetelmiö'ä – Akustiikan ja äänenkäsittelytekniikan laboratorion raportti 41 Edited by:M Karjalainen. TKK/Akustiikka Abstract: Notes:
Conference papers

2013	A Koutrouvelis, A Härmä, A Mouchtaris (2013) Compressive Sensing in Footstep Sounds, Hand Tremors and Speech Using K-SVD Dictionaries In: DSP 2013, 18th International Conference on Digital Signal Processing Crete, Greece: Abstract: Notes:
2012	Aki Härmä (2012) Detection of audio events by boosted learning of local time-frequency patterns In: Proc. 45th AES Convention on Apps. of Time-Frequency Processing in Audio Espoo, Finland: Abstract: Notes:
	Aki Härmä, Ralph van Dinther, Thomas Svedström, Munhum Park, Jeroen Koppens (2012) Personalization of Headphone Spatialization Based on the Relative Localization Error in an Auditory Gaming Interface In: 132nd AES Convention Preprint #8644 Budapest, Hungary: Abstract: Notes:
2011	A Härmä (2011) Stereo audio classification for audio enhancement In: Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (ICASSP'2011) Pragua, Czech: Abstract: Notes:
	Aki Härmä (2011) Estimation of the Energy Ratio Between Primary and Ambience Components in Stereo Audio Data In: Proc. 19th European Signal Processing Conf. (EUSIPCO 2011) Barcelona, Spain: Abstract: Stereo audio signal is often modeled as a mixture of instantaneously mixed primary components and uncorrelated ambience components. This paper focuses on the estimation of the primary-to-ambience energy ratio, PAR. This measure is useful for signal decomposition in stereo and multichannel audio coding, format conversion, and spatial audio enhancement. The conventional approaches for the estimation of the ratio are based on the ratio of eigenvalues which requires equal energies of the ambience signals. This often leads to an inaccurate estimate of PAR. An alternative measure is proposed which reduces those estimation errors but requires a priori information about the primary component signal. The performance of the method is demonstrated with synthetic signals and a large collection of stereo audio data. Notes:

	Aki Härmä, Munhum Park (2011) Extraction of Voice from the Center of the Stereo Image In: AES 130th Convention preprint 8435 London, UK: Abstract: Notes:
	Georgina Tryfou, Aki Härmä, Athanasios Mouchtaris (2011) TEMPO ESTIMATION BASED ON LINEAR PREDICTION AND PERCEPTUAL MODELLING In: 12th International Society for Music Information Retrieval Conference (ISMIR 2011) Miami, FL, USA: Abstract: Many applications demand the automatic induction of the tempo of a musical excerpt. The tempo estimation systems follow a general scheme that consists of two main steps: the creation of a feature list and the detection of periodicities on this list. In this study, we propose a new method for the implementation of the ï¬�rst step, along with the addition of a ï¬�nal step that will enhance the tempo estimation procedure. The proposed method for the extraction of the feature list is based on Gammatone subspace analysis and Linear Prediction Error Filters (LPEFs). As a ï¬�nal step on the system, the application of a model that approximates the tempo perception by human listeners is proposed. The results of the evaluation indicate the proposed method compares favourably with other, state-of-the-art tempo estimation methods, using only one frame of the musical experts when most of the literature methods demand the processing of the whole piece Notes:
2010	Bastian Reineke, Bert den Brinker, Aki Härmä (2010) Time-domain bandwidth extension of speech In: Proceedings of the 31st Symposium on Information Theory in the Benelux 81-88 Rotterdam, The Netherlands: Abstract: Notes:
	Aki Härmä (2010) Classification of Time-Frequency Regions in Stereo Audio In: AES 128th Convention preprint 7980 London, UK: Abstract: Notes:
	Munhum Park, Aki Härmä, Steven van de Par, Georgia Tryfou (2010) Comparison of the Width of Sound Sources in 2-Channel and 3-Channel Sound Reproduction In: AES 128th Convention Paper 8071 London, UK: Abstract: Notes:
	Tommi Määttä, Aki Härmä, Hamid Aghajan (2010) On efficient fusion of multi-view data for activity detection In: ACM/IEEE Int. Conf. Distributed Smart Cameras (ICDSC 2010) Atlanta, GA, USA: Abstract: Notes:
2009	Jorge Peregrín Emparanza, Pavan Dadlani, Boris de Ruyter, Aki Härmä (2009) Ambient Telephony: Designing a Communication System for Enhancing Social Presence in Home Mediated Communication In: Proc. Int. Conf. Affective Computing and Intelligent Interaction Amsterdam, The Netherlands: Abstract: Notes:
	Aki Härmä, Kien Pham (2009) Conversation detection in ambient telephony In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP' 2009) Taiwan: Abstract: Notes:
	Eleftheria Georganti, Tobias May, Steven van de Par, Aki Härmä, John Mourjopoulos (2009) Single-Channel Sound Source Distance Estimation Based on Statistical and Source-Specific Features In: AES 126th Convention Munich, Germany: Abstract: Notes:
	Tommi Määttä, Hamid Aghajan, Aki Härmä (2009) Home-to-home communication using 3D shadows In: Immersive Telecommunications (IMMERSCOM 2009) Berkeley, CA, USA: Abstract: Notes:

2008	Aki Härmä, Steven van de Par, Werner de Bruijn (2008) On the use of directional speakers to create a sound source close to the listener In: AES 124th Convention Amsterdam, The Netherlands: Abstract: Notes:
2007	Aki Härmä, Steven van de Par, Werner de Bruijn (2007) Spatial audio rendering using sparse and distributed arrays In: AES 122nd Convention Vienna, Austria: Abstract: Notes:
	Timo Haapsaari, Werner de Bruijn, Aki Härmä (2007) Comparison of Different Sound Capture and Reproduction Techniques in a Virtual Acoustic Window In: AES 122nd Convention Vienna, Austria: Abstract: Notes:
	A Härmä (2007) Ambient telephony: scenarios and research challenges In: INTERSPEECH 2007 Antwerp, Belgium: Abstract: Notes:
	Aki Härmä, Steven van de Par (2007) Spatial track transition effects for headphone listening In: Proc. 10th DAFx Conference Bordeaux, France: Abstract: Notes:
2006	A Härmä (2006) Online acoustic measurements in a networked audio system In: AES 120th Convention Paris, France: Abstract: Notes:
2005	S Fagerlund, A Härmä (2005) Parametrization of inharmonic bird sounds for automatic recognition In: Proc. EUSIPCO’2005 Antalya, Turkey: Abstract: Notes:
	S Vesa, A Härmä (2005) Automatic estimation of reverberation time from binaural signals In: Proc. ICASSP’2005 Philadelphia, USA: Abstract: Notes:
	A Härmä, A C den Brinker (2005) Manipulation of the vocal tract filter in real-time speech modification In: 4th Philips Conference on Digital Signal Processing Veldhoven, the Netherlands: Abstract: Notes:
	A Härmä, M F McKinney, J Skowronek (2005) Automatic surveillance of the acoustic activity in our living environment In: Proc. Int. Conf. Multimedia and Expo (ICME’2005) Amsterdam, The Netherlands: Abstract: Notes:

	A Härmä, A van Leest, R Thaden (2005) Volume control in networked audio systems In: Proc. IEEE IWAENC 2005 IEEE Eindhoven, The Netherlands: Abstract: Notes:
	Aki Härmä, Peter Voorwinden (2005) Time-frequency datagrams for the estimation of acoustic paths in networked audio systems In: 4th Philips Conference on Digital Signal Processing Veldhoven, the Netherlands: Abstract: Notes:
	Aki Härmä, Janto Skowronek, Martin F McKinney (2005) Acoustic monitoring of the patterns of activity in the office and the garden In: Proc. Measuring Behaviour 2005 Wageningen, The Netherlands: Abstract: Notes:
2004	M Tikander, A Härmä, M Karjalainen (2004) Acoustic positioning and head tracking based on binaural signals In: AES 116th Convention Berlin, Germany: Abstract: Notes:
	T Lokki, H Nironen, S Vesa, L Savioja, A Härmä, M Karjalainen (2004) Application Scenarios of Wearable and Mobile Augmented Reality Audio In: AES 116th Convention Berlin, Germany: Abstract: Notes:
	A Härmä, P Somervuo (2004) Classification of the harmonic structure in bird vocalization In: IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP’04) Montreal,Canada: Abstract: Notes:
	M Karjalainen, M Tikander, A Härmä (2004) Head-tracking and subject positioning using binaural headset microphones and common modulation anchor sources In: IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP’04) Montreal,Canada: Abstract: Notes:
	T Lokki, H Nironen, S Vesa, L Savioja, A Härmä (2004) Problem of far-end user’s voice in binaural telephony In: 18th International Congress on Acoustics (ICA 2004) Kyoto, Japan: Abstract: Notes:
	P Somervuo, A Härmä (2004) Bird song recognition based on syllable pair histograms In: IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP’04) Montreal,Canada: Abstract: Notes:
	Aki Härmä, Christof Faller (2004) Spatial decomposition of time-frequency regions : subbands or sinusoids In: AES 116th Convention Berlin, Germany: Abstract: Notes:

2003	M Tikander, A Härmä, M Karjalainen (2003) Binaural Positioning System for Wearable Augmented Reality Audio In: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA’01) New Paltz, New York, USA: Abstract: Notes:
	A Härmä (2003) Automatic recognition of bird species based on sinusoidal modeling of syllables In: IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP 2003) 535-538 Hong Kong: Abstract: Notes:
	A Härmä, J Jakka, M Tikander, M Karjalainen, T Lokki, H Nironen, S Vesa (2003) Techniques and applications of wearable augmented reality audio In: AES 114th Convention Paper Amsterdam, The Netherlands: Abstract: Notes:
	P Somervuo, A Härmä (2003) Analyzing bird song syllables on the Self-Organizing Map In: Proc. of Workshop on Self-Organizing Maps (WSOM ’03) Hibikino, Japan: Abstract: Notes:
2002	G Schuller, A Härmä (2002) Low Delay Audio Compression using Predictive Coding In: Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing Orlando, Fl, USA: Abstract: Notes: Submitted
	A Härmä (2002) Coding principles for virtual acoustic openings In: Proc. AES 22nd International Conference on Virtual, Synthetic and Entertainment Audio Espoo, Finland: Abstract: Notes:
	A Härmä, Tapio Lokki, Ville Pulkki (2002) Drawing quality maps of the sweet spot and its surroundings in multichannel reproduction and coding In: Proc. AES 21st International Conference St. Petersburg, Russia: Abstract: Notes:
2001	T Paatero, M Karjalainen, A Härmä (2001) Modeling and Equalization of Audio Systems Using Kautz Filters In: Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing 434-437 Salt Lake City, Utah, USA: Abstract: Notes:
	A Härmä, T Paatero (2001) Discrete representation of signals on a logarithmic frequency scale In: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA’01) 135-138 New Paltz, NY, USA: Abstract: Notes:
	H Penttinen, M Karjalainen, A Härmä (2001) Morphing Instrument Body Models In: Proceedings of the COST-G6 Conference on Digital Audio Effects (DAFx01) Limerick, Ireland: Abstract: Notes:

2000	A Härmä (2000) Evaluation of a warped linear predictive coding scheme In: Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing 897-900 Istanbul, Turkey: Abstract: Notes:
	A Härmä, M Karjalainen, L Savioja, V Välimäki, U K Laine, J Huopaniemi (2000) Frequency-warped signal processing for audio applications In: AES 108th Convention, preprint 5171 (T-5) Paris, France: Abstract: Notes:
	M Vaalgamaa, A Härmä, U K Laine (2000) Subjective evaluation of LSF quantization in conventional and warped LP based audio coding In: Accepted for publ. in Signal Processing X : Theories and Applications 2065-2068 Tampere, Finland: Abstract: Notes:
	A Härmä, M Juntunen, P Kaipio (2000) Time-varying autoregressive modeling of speech and audio signals In: Signal Processing X : Theories and Applications 2037-2040 Tampere, Finland: Abstract: Notes:
	H Penttinen, A Härmä, M Karjalainen (2000) Digital guitar body mode modulation with one driving parameter In: Proc. COST-G6 Conf. Digital Audio Effects (DAFx’00) Verona, Italy: Abstract: Notes:
1999	A Härmä (1999) Low-level auditory modeling of temporal effects In: 16th Int. Joint Conf. on Artificial Intelligence, Workshop on Computational Auditory Scene Analysis Edited by:H G Okuno. 1-9 Stockholm, Sweden: Abstract: Notes:
	A Härmä (1999) On the utilization of overshoot effects in low-delay audio coding In: Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing 893-896 Phoenix, Arizona: Abstract: Notes:
	A Härmä, U K Laine (1999) Warped Low-Delay CELP for Wide-band Audio Coding In: Proc. AES 17th Int. Conference : High-Quality Audio Coding 207-215 Florence, Italy: Abstract: Notes:
	A Härmä, K Palomäki (1999) HUTear – a free Matlab toolbox for modeling of human hearing In: Proc. Matlab DSP Conference 1999 96-99 Espoo, Finland: Comsol Oy Abstract: Notes: http://www.acoustics.hut.fi/software/HUTear/
	M Vaalgamaa, A Härmä, U K Laine (1999) Audio coding with auditory time-frequency noise shaping and irrelevancy reducing vector quantization In: Proc. AES 17th Int. Conference : High-Quality Audio Coding 182-188 Florence, Italy: Abstract: Notes:

1998	A Härmä (1998) Implementation of recursive filters having delay free loops In: Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing 1261-1264 Seattle, Washington: Abstract: Notes:
	A Härmä, M Vaalgamaa, U K Laine (1998) A Warped Linear Predictive Stereo Codec Using Temporal Noise Shaping In: Proc. Nordic Signal Proc. Symposium, NORSIG’98 229-232 Denmark: Abstract: Notes:
	M Gröhn, J Backman, A Härmä (1998) Signal modulation approach to data sonification In: Proc. SPIE’98 53-62 Abstract: Notes:
	A Härmä, U K Laine, M Karjalainen (1998) Backward adaptive warped lattice for wideband stereo coding In: Signal Processing IX : Theories and Applications, EUSIPCO’98 729-732 Rhodes, Greece: Abstract: Notes:
	K Palomäki, A Härmä, U K Laine (1998) Warped linear predictive audio coding in video conferencing application In: Signal Processing IX : Theories and Applications, EUSIPCO’98 1433-1436 Rhodes, Greece: Abstract: Notes:
1997	A Härmä, U K Laine, M Karjalainen (1997) An Experimental Audio Codec based on Warped Linear Prediction of Complex Valued Signals In: Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing 323-327 Munich, Germany: Abstract: Notes:
	A Härmä, U K Laine, M Karjalainen (1997) WLPAC – A Perceptual Audio Codec in a Nutshell In: AES 102nd Conv. preprint 4420 Munich, Germany: Abstract: Notes:
	M Karjalainen, A Härmä, U K Laine (1997) Realizable Warped IIR Filters and Their Properties In: Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing 2205-2209 Munich: Abstract: Notes:
	M Karjalainen, A Härmä, J Huopaniemi, U K Laine (1997) Warped Filters and Their Audio Applications In: IEEE Workshop Appl. Signal Proc. Acoust. and Audio New Paltz, New York: Abstract: Notes:
	A Härmä (1997) Kuulonmukaiset taajuusasteikot In: Akustiikkapäivät 97 Abstract: Notes:
1996	U K Laine, A Härmä (1996) Bark-FAMlet filterbanks In: Proc. Nordic Acoustical Meeting 277-284 Helsinki, Finland: Abstract: Notes:
	A Härmä, U K Laine, M Karjalainen (1996) Warped Linear Prediction in Audio Coding In: Proc. IEEE Nordic Signal Proc. Symposium, NORSIG’96 447-450 Espoo, Finland: Abstract: Notes:
	M Karjalainen, A Härmä, U K Laine (1996) Realizable Warped IIR Filter Structures In: Proc. of the IEEE Nordic Signal Proc. Symposium, NORSIG 96 483-486 Espoo, Finland: Abstract: Notes:
Masters theses

1998	A Härmä (1998) Audio coding with warped predictive methods (Licentiate's Thesis) Helsinki University of Technology Abstract: Notes: p. 104
1997	A Härmä (1997) Perceptual aspects and warped techniques in audio coding (Master's Thesis) Helsinki University of Technology Abstract: Notes: p. 88
PhD theses

2001	A Härmä (2001) Frequency-warped autoregressive modeling and filtering Helsinki University of Technology Abstract: Notes:

Journal articles

Book chapters

Conference papers

Masters theses

PhD theses