14th International Congress of Phonetic Sciences (ICPhS-14)

San Francisco, CA, USA
August 1-7, 1999

Investigating Automatic Language Discrimination via Vowel System and Consonantal System Modeling

Nathalie Parlangeau-Vallès (1), François Pellegrino (1,2), Régine André-Obrecht (1)

(1) IRIT, Toulouse, France; (2)DDL, Lyon, France

This paper presents an approach to Automatic Language Identification (ALI) based on a differentiated modeling of vowel and consonantal systems. The objective is to consider phonetic and phonological features that are not taken into account in the standard phonotactical approach. For each language, two Gaussian Mixture Models (GMM) are trained respectively with automatically detected vowel and non-vowel segments. Since this vocalic detection is unsupervised and language independent, no labeled data are required. GMMs are initialized using a datadriven variant of the LBG vector quantization algorithm: the LBG-Rissanen algorithm. Experiments show that this algorithm behaves efficiently to take the vowel system structure into account.
   With 5 languages from the OGI MLTS corpus and in a close set identification task, we reach 85 % of correct identification for the 45 second duration utterances, considering the male speakers.

Full Paper

Bibliographic reference.  , Nathalie Parlangeau-Vallès / Pellegrino, François / André-Obrecht, Régine (1999): "Investigating automatic language discrimination via vowel system and consonantal system modeling", In ICPhS-14, 141-144.