14th International Congress of Phonetic Sciences (ICPhS-14)

San Francisco, CA, USA
August 1-7, 1999

Independent Automatic Segmentation of Speech by Pronunciation Modeling

Nicole Beringer, Florian Schiel

Department of Phonetics, University of Munich, Germany

In this paper we present an iterative automatic segmentation system which does not require any domain dependent training data. Input to the system is the canonical pronunciation and the speech signal of an utterance to be segmented, as well as a set of phonological pronunciation rules. The output is a string of phonetic labels (SAM-PA) and the corresponding segment boundaries of the speech signal.
   The system consists of three main parts:
   In a first stage a set of general phonological rules is applied to the canonical pronunciation of an utterance yielding a graph that contains the canonic form and presumed variations.
   In a second HMM-based stage the speech signal of the concerning utterance is time-aligned to this graph using a Viterbi search. The outcome of this stage is the time-aligned transcription of the input utterance.
   Using this "raw" application of the phonological rules as the baseline in a third stage, a new set of statistically weighted rules is derived.
   The procedure is repeated iteratively until the segmentation is not changed anymore.

Full Paper

Bibliographic reference.  Beringer, Nicole / Schiel, Florian (1999): "Independent automatic segmentation of speech by pronunciation modeling", In ICPhS-14, 1653-1656.