14th International Congress of Phonetic Sciences (ICPhS-14)

San Francisco, CA, USA
August 1-7, 1999

Multi-Lingual Automatic Phoneme Clustering

Philippe Boula de Mareüil, Cristóbal Corredor-Ardoy, Martine Adda-Decker

Spoken Language Processing Group, LIMSI-CNRS, Orsay, France

In this article, we describe an approach for automatic multi-lingual phoneme classification. The classes were obtained by agglomerative hierarchical clustering. We used a similarity measure based on the likelihood between the acoustic frames and the Hidden Markov Models. The method was applied to French, English, German, Spanish (IDEAL corpus), as well as to Italian and Portuguese (SPEECHDAT corpus). The analysis of the clusters demonstrated that, despite the acoustic mismatch between these corpora, this approach remains robust. For 90 clusters, the obtained classes correspond, to a large extent, to well defined linguistic groups. A qualitative analysis of the results is given.

