15th International Congress of Phonetic Sciences (ICPhS-15)

Barcelona, Spain
August 3-9, 2003

A Grapheme-to-Phoneme Transcription Algorithm Based on the SAMPA Alphabet Extension for the Polish Language

Mikolaj Wypych (1), Emilia Baranowska (2), Grażyna Demenko (2)

(1) Poznan University of Technology, Poland
(2) Adam Mickiewicz University, Poland

The paper concerns the automatic generation of broad and narrow transcription of the Polish language in the module of a concatenative TTS system under development at Adam Mickiewicz University and Poznan Technical University, Poland. The existing phonetic notations and transcription rules for the Polish language were verified on the basis of: (1) the literature describing the Polish phonological system, (2) the results of the acoustic segmentation for a few hundred utterances produced by 50 speakers. The modified computer-readable SAMPA alphabet was adopted in its basic and in a broadened allophonic version. A rule-based method aided by an additional dictionary of exceptions is thoroughly described. Finally, the article focuses on a rule compiler, a rule applier and dedicated development environment. The resulting G2P module implementation, called PolPhone, is available free for academic purposes.

