14th International Congress of Phonetic Sciences (ICPhS-14)
San Francisco, CA, USA
We studied vowel classification and speaker normalization performance with neural nets based on Adaptive Resonance Theory (ART). ART was developed by S. Grossberg as a theory of human cognitive information processing. It is the result of an attempt to understand how biological systems are capable of retaining plasticity throughout life, without compromising the stability of previously learned patterns. We have implemented some of these ideas in a supervised neural network called CategoryART. The neural network was trained with formant frequency values extracted at the midpoint of vowels. Vowels were selected from the TIMIT speech corpus. Separate train and test sets were used. Of the 630 speakers in this database 438 are male and 192 are female. A simple preprocessing algorithm achieved normalization. Only the 13 monophthong vowel categories (iy, ih, ey, eh, ae, aa, ow, ah, ao, uw, uh, ux, er) were used. Formant frequency values were determined by an LPC analysis. To compare formant frequency values for males and females, normalized frequency values were calculated in a preprocessing stage. Next to the neural net we also used a Gaussian classifier. This classifier attained on the average 57% correct classification. The neural network did not perform as well as the Gaussian classifier and only achieved 50% correct classification.
Bibliographic reference. Weenink, David / Pols, Louis C. W. (1999): "Multi-speaker vowel classification with adaptive neural nets", In ICPhS-14, 1633-1636.