14th International Congress of Phonetic Sciences (ICPhS-14)

San Francisco, CA, USA
August 1-7, 1999

Query Language for Research in Phonetics

Ulrich Heid, Andreas Mengel

Institut für maschinelle Sprachverarbeitung, University of Stuttgart, Germany

With the growing availability of spoken language corpora more and more data driven research in phonetics is possible. The downside of having huge speech corpora is that they have to be segmented and labeled, before they can be exploited. As labeling and annotation are time-consumina and costly, there is an interest in standardization which would support the exchange and reuse of labeled data. The MATE project proposes standards for an integrated and consistent multi-level annotation of speech and especially dialogue corpora. These proposals are based on the existing TEI standard (Text Encoding Initiative). All label infomiation is represented in XML. thus there is a uniform representation of the different linguistic levels of description. This makes the implementation of tools easier and provides uniform access to the data. e.g. phonetic segmentation, prosodic labeling, grammatical annotation, dialogue acts classification, etc.
   For the retrieval of infomiation across multiple levels, a special query language and a query processor were developed. The query language was designed for the purpose of specifying linguistic items, contexts and constellations of phenomena to be found in spoken (dialogue) data. Basic concepts of this query language (called Q4M) are operators that let the user address both hierarchical (i.e. theory dependent) structures and physical (i.e. phenomenological) relations of linguistic objects.
   The query processor is integrated into a software environment that allows the user to view results and to reformulate the query for further refinement and exploration of results. Thus, with the help of Q4M it will be easier, for example, to identify speech segments of variable length for the extraction and use in concatenative speech synthesis systems, or to investigate the interplay of speech acts and intonation and to test relevant hypotheses.

