Sonderforschungsbereich 732:

Project A8-N (2016-2018)

Investigating the Interaction between Speech and Language processing for Spoken Language Understanding

PI: Ngoc Thang Vu

This project aims at investigating the interaction between different levels of information of spoken language (not only from segmental aspects but also supra-segmental aspects) and text (e.g. syntax) on non-canonical speech data (conversational style) and its impact on spoken language understanding (SLU) systems. We will develop an innovative architecture named ‘focus listener’ and consider sentiment analysis as a case study. The final system will be evaluated on German political radio interviews.

SLU research has been mostly conducted in such a way that it keeps the technologies for automatic speech recognition (ASR) and natural language processing (NLP) separate (Issar and Ward, 1993; Wang et al., 2005; Mori et al., 2008; Tur and Mori, 2011). Nevertheless, their interactions such as interaction between supra-segmental aspects of speech (prosody) and properties automatically derivable from written language (e.g. parts-of-speech or syntax), their joint impact on semantics and information structure is well documented in more theoretical research (Gussenhoven, 1984; Ladd, 1996; Steinhauer, 1999; Gobl and Ni, 2003; Gussenhoven, 2004; Baumann and Grice, 2006; Grice and Baumann, 2007; Féry and Kügler, 2008; Büring, 2011; Reckling and Kügler, 2011; Büring, 2012). This project considers linguistic information at several levels and its interaction for SLU research in order to find answers for the following research questions:

  • What is the impact of combining syntactic information and prosody on segmenting speech into semantically relevant units like propositions?
  • How can we consider the joint modeling of ASR and NLP tasks such as dependency parsing with the use of speech lattices for non-canonical speech data?
  • How does prosody influence SLU tasks like sentiment analysis? Accordingly, we propose an innovative architecture for SLU systems, called ‘focus listener’, which considers these three aspects to further improve the SLU performance.