Data-driven Dependency Parsing - Context Factors in Dependency Classification
Principal Investigator: Jonas Kuhn
Researchers: Anders Björkelund and Wolfgang Seeker
Former members: Bernd Bohnet
Student researchers: Xiang Yu
A dependency parser assigns dependency relations among the words of a sentence, e. g., grammatical relations or thematic roles. This is a formally simple characterisation of different linguistic analysis tasks, which captures the key structural processing issues uniformly for typologically different languages. It thus lends itself as the basis for systematic studies of the role that different context factors play in each local dependency classification decision. D8 complements ongoing activities in the SFB 732 with an area that has lately been receiving considerable attention in cutting-edge research in Natural Language Processing (NLP). In search of the theoretical underpinnings of data-driven parsing and the role of linguistic factors, we systematically compare competing algorithmic strategies and machine learning techniques, analyze the influence of typological and representational characteristics of the target structures, and study the effect of hard linguistic constraints interacting with soft constraints derived from corpus data.
Since Anders Björkelund joined the project, D8 has also broadend the scope to include coreference resolution. Coreference resolution is a natural extension of the standard processing pipeline that relies on syntactic information.
Throughout the project we have shown how underspecification in the pipeline alleviates error propagation between components that are traditionally strictly separated (Bohnet and Nivre, 2012; Seeker and Kuhn, 2013). We have also made careful analyses of the interaction between morphology and syntax in the processing pipeline, and corroborated linguistically motivated theories (Seeker and Kuhn, 2011;Seeker and Kuhn, 2013).
We also participated in the 2012 CoNLL shared task on multilingual coreference resolution. In the shared task we showed that lexicalized and language-aware, high-dimensional feature representations and system combinations are competitive with the state-of-the-art. Our system ranked second among 16 participants (Björkelund and Farkas, 2012).
Recently, we also participated in the SPMRL 2013 Shared Task, which targeted parsing of 9 morphologically rich languages, both using constituent and dependency syntax. Our contribution, which was a collaboration with D2, as well as former members of D4, obtained the best results for both syntactic paradigms (Björkelund et al., 2013).
Other successful collaborations that we have carried out include