14th International Congress of Phonetic Sciences (ICPhS-14)
San Francisco, CA, USA
With the growing availability of spoken language corpora more and
more data driven research in phonetics is possible. The downside of
having huge speech corpora is that they have to be segmented and
labeled, before they can be exploited. As labeling and annotation are
time-consumina and costly, there is an interest in standardization
which would support the exchange and reuse of labeled data. The
MATE project proposes standards for an integrated and consistent
multi-level annotation of speech and especially dialogue corpora.
These proposals are based on the existing TEI standard (Text
Encoding Initiative). All label infomiation is represented in XML.
thus there is a uniform representation of the different linguistic levels
of description. This makes the implementation of tools easier and
provides uniform access to the data. e.g. phonetic segmentation,
prosodic labeling, grammatical annotation, dialogue acts
For the retrieval of infomiation across multiple levels, a special query language and a query processor were developed. The query language was designed for the purpose of specifying linguistic items, contexts and constellations of phenomena to be found in spoken (dialogue) data. Basic concepts of this query language (called Q4M) are operators that let the user address both hierarchical (i.e. theory dependent) structures and physical (i.e. phenomenological) relations of linguistic objects.
The query processor is integrated into a software environment that allows the user to view results and to reformulate the query for further refinement and exploration of results. Thus, with the help of Q4M it will be easier, for example, to identify speech segments of variable length for the extraction and use in concatenative speech synthesis systems, or to investigate the interplay of speech acts and intonation and to test relevant hypotheses.
Bibliographic reference. Heid, Ulrich / Mengel, Andreas (1999): "Query language for research in phonetics", In ICPhS-14, 1225-1228.