Disambiguierung von Nominalisierungen bei der Extraktion linguistischer Daten aus Corpustext
Many words and syntactic constructions of natural language have several readings, i.e. they are ambiguous. Such ambiguities can cause problems in the automatic extraction of data from texts, as it is used, for example, in the creation of dictionaries and grammars, in Question Answering or Information Retrieval. This project aims at making data extraction aware of ambiguities, to increase the quality of the extracted data.
Data extraction is often faced with the following types of ambiguities:
In this project, we work on lexical ambiguities in German nominalizations of verbs, especially those with the affix '-ung', such as 'Teilung', 'Anwendung', etc. Many such nouns are ambiguous between an event reading ('Teilung durchführen') and a state reading ('Teilung besteht'), or between an event reading ('Messung vornehmen') and a (result) object reading ('Messung (= Meßergebnis) liegt vor'), or between all three interpretations. As the examples show, a given interpretation (and thus the disambiguation) of an '-ung'-nominalization may be supported or enforced by lexical or grammatical indicators from the context. Examples of such indicators are verbs embedding the nominalization (see the examples above), adjectival modifiers, or certain types of prepositional phrases in the nominalization's context.
We aim at making data extraction aware of ambiguities: some need to be resolved, in order to get high quality extraction results; for others, it is sufficient to recognize them as having no impact on the extraction. Classifying and solving the ambiguities is only possible within the context of a sentence; in some cases (not analyzed in this project), more context is necessary.
In detail, our objectives include the development of the following components:
This project thus intends to contribute to the syntactic and semantic representation and to the handling of ambiguities in large corpora, and to the disambiguation of German nominalizations.
PI: Ulrich Heid (01.07.2006 - 30.09.2011), Jonas Kuhn
Researchers: Andre Blessing, Kerstin Eckart, Ina Rösiger