14th International Congress of Phonetic Sciences (ICPhS-14)
San Francisco, CA, USA
The syllable serves as an important interface between the lowerlevel (phonetic and phonological) and the higher-level (morphological and lexical) representational tiers of language. It has been demonstrated that reliable segmentation of spontaneous speech into syllabic entities is useful for speech recognition. An automatic method is described for delineating the temporal boundaries of syllabic units in continuous speech using a Temporal Flow Model (TFM) and modulation-filtered spectral features. The TFM is a neural network architecture that supports arbitrary connectivity across layers, provides for feed-forward as well as recurrent links, and allows variable propagation delays along links. Two TFM configurations, global and tonotopic, have been developed and trained on a phonetically transcribed corpus of telephone and address numbers spoken over the telephone by several hundred individuals of variable dialect, age and gender. The networks reliably detected the boundaries of syllabic entities with an accuracy of ca. 84%.
Bibliographic reference. Shastri, Lokendra / Chang, Shuangyu / Greenberg, Steven (1999): "Syllable detection and segmentation using temporal flow neural networks", In ICPhS-14, 1721-1724.