14th International Congress of Phonetic Sciences (ICPhS-14)
San Francisco, CA, USA
In this paper we present an iterative automatic segmentation
system which does not require any domain dependent training
data. Input to the system is the canonical pronunciation and the
speech signal of an utterance to be segmented, as well as a set
of phonological pronunciation rules. The output is a string of
phonetic labels (SAM-PA) and the corresponding segment
boundaries of the speech signal.
The system consists of three main parts:
In a first stage a set of general phonological rules is applied to the canonical pronunciation of an utterance yielding a graph that contains the canonic form and presumed variations.
In a second HMM-based stage the speech signal of the concerning utterance is time-aligned to this graph using a Viterbi search. The outcome of this stage is the time-aligned transcription of the input utterance.
Using this "raw" application of the phonological rules as the baseline in a third stage, a new set of statistically weighted rules is derived.
The procedure is repeated iteratively until the segmentation is not changed anymore.
Bibliographic reference. Beringer, Nicole / Schiel, Florian (1999): "Independent automatic segmentation of speech by pronunciation modeling", In ICPhS-14, 1653-1656.