14th International Congress of Phonetic Sciences (ICPhS-14)San Francisco, CA, USA |
This paper presents an approach to Automatic Language
Identification (ALI) based on a differentiated modeling of vowel
and consonantal systems. The objective is to consider phonetic
and phonological features that are not taken into account in the
standard phonotactical approach. For each language, two
Gaussian Mixture Models (GMM) are trained respectively with
automatically detected vowel and non-vowel segments. Since this
vocalic detection is unsupervised and language independent, no
labeled data are required. GMMs are initialized using a datadriven
variant of the LBG vector quantization algorithm: the
LBG-Rissanen algorithm. Experiments show that this algorithm
behaves efficiently to take the vowel system structure into
account.
With 5 languages from the OGI MLTS corpus and in a close set
identification task, we reach 85 % of correct identification for the
45 second duration utterances, considering the male speakers.
Bibliographic reference. , Nathalie Parlangeau-Vallès / Pellegrino, François / André-Obrecht, Régine (1999): "Investigating automatic language discrimination via vowel system and consonantal system modeling", In ICPhS-14, 141-144.