The Significance of Minor Nuances

Computers are using algorithms to learn to understand what it means to understand something

A team of computational linguists try to "teach" computers the principle of language comprehension.

She is intelligent, beautiful and empathetic, “she” being Ava, a “female” android from the film Ex Machina. Even though it looks and acts like a human, that’s not what it is. What distinguishes this humanoid machine from other robots is its capacity for independent speech and ability to listen to others and understand what is being said. Whilst Hollywood has been conjuring up images of artificially intelligent entities on screen for decades, we are still a long way from building them in reality. There is one crucial thing missing which is preventing the development of computers that can process languages without errors: they first need to be taught to understand human language. Scientists at the University of Stuttgart are currently working on this.

Computational linguist Sebastian Padó, a professor at the Institute for Natural Language Processing, is currently conducting research into the principle of language comprehension. The one aspect of language that computers have thus far been struggling to grasp involves context and contiguities. They are simply unable to recognise unspoken but implied meaning. Humans, by contrast, can immediately contextualise what they are hearing and relate it to what they already know. “Computers have no personal knowledge about language” says Professor Padó. “For a computer, a sentence consists of nothing more than a sequence of symbols, with which it can initially do nothing.” The professor and his team want to change that in future.

Computers are intended to learn, how to understand human language.
Computers are intended to learn, how to understand human language.

To this end, they are primarily concerned with so-called distributed meaning descriptions. They are attempting to teach meaning to computers. “We do that by telling the computer To look at how we are using the words we are speaking”, Padó explains. Essentially that works a bit like when a human learns to master a new language.

Inching forward step-by-step

That is what computers should also be capable of in the future. To this end, the team are inching their way towards the meaning of a word one step at a time by including the preceding and following parts of the sentence. The process can be described as follows: When a person hears an unknown word such as “the gurmel” for example, he or she initially has no idea of its meaning but, if the word is followed by “is standing in the byre”, then one may assume that it could be some sort of animal. If this is then followed by “and says ‘moo’” then the word very probably means ‘cow’. “Our research involves a similar line of thought”, Padó explains: “We feed the computer with an enormous volume of text and get it to analyse which words are used in which context.”

This principle is used, for example, in search engines that can use the approach not only to search for the specified search terms, but also related words. Automatic translation algorithms work in a very similar manner, by performing the same kind of analysis on large volumes of pre-translated text. This enables them to discover how words are used in similar ways across different languages and to suggest possible translations.

Algorithms are to analyze texts

To ensure that the computers succeed in this, the computational linguists had to develop specific algorithms that perform the appropriate analysis on the texts. “This part of our work combines linguistics with computer science and machine learning”, says Padó. This means that the algorithms are not only motivated by information and mathematical concepts, but also by linguistic theories. For example, Padó’s computer programmes analyse individual nouns with the aid of the associated adjectives and verbs to learn their meaning. As the professor explains: “using our algorithms, the computer perceives that the word ‘gurmel’ tends to be associated with adjectives such as ‘large’ and ‘pied’ and with verbal constructions such as ‘says moo’ or ‘ruminate’. By considering all of this additional information together, the computer figures out that the word refers to a cow.”

Computer Programmes Analyse Individual Nouns with the Aid of the Associated Adjectives and Verbs to Learn Their Meaning.
Computer Programmes Analyse Individual Nouns with the Aid of the Associated Adjectives and Verbs to Learn Their Meaning.

Supplementing the Tried and Trusted with New Knowledge

The field of Linguistics can also garner new knowledge from the combination of linguistic theories and algorithms, as the analyses help to confirm existing linguistic theories – or else to disprove them. “What our text corpus reveals to us time and time again”, says Padó, “is that the reality of the situation is not as simple as the theory would have us believe.” Whilst linguistics tend to work on the assumption that human speech is grammatically correct and, therefore made up of well-constructed sentences, the reality looks rather different.

“There are major differences between language usage in books and newspaper articles on the one hand and comments posted on the Internet on the other”, says Padó “Usually the latter do not consist of complete and grammatically correct sentences.” This notwithstanding, the computational linguistic methods can also be used for linguistic analysis as Padó explains: “For example, we search large volumes of text to determine the presence and frequency of specific words. This allows us to track the usage of neologisms such as ‘Brexit’ or changes in meaning over time”, a Sisyphean task that a human could barely manage. “We, on the other hand, can simply write a programme that can analyse the entire text corpus within a few minutes. This provides linguists with an effective tool.”

Making Texts and Knowledge Accessible

In the final analysis, the findings from Professor Padó’s research have ramifications for us all, for example, because they may help to improve translation systems or voice controlled programmes. For Professor Padó, however, such areas of applications are already part of the status quo. He is far more interested in the fact that the majority of knowledge generated within human society is still recorded in text form. “So, language is the key to knowledge”, says Padó.

“Our objective is to make texts accessible to automatic processes which will allow us to access the data they represent. That goes far beyond any simple Google search.” Nevertheless and despite significant advances achieved over the past few years, the ultimate destination of his research remains an open question. “We’re still years away from being able to construct a machine that can speak, listen and learn language as well as humans can.” Until then we shall have to make do with Hollywood’s film versions of computers who understand language and artificial intelligence. Constanze Trojan

  • Prof. Sebastian Padó, Institute for Natural Language Processing (IMS), phone +49 711 685-81400, E-Mail, Website

Contact

 

University Communications

Keplerstraße 7, 70174 Stuttgart

To the top of the page