Siri doesn’t understand you

GUIs and VUIs

Since the early 1970s, graphical user interfaces have characterised most interactions with computers in most contexts, for most people. Increasingly, voice interfaces are becoming normalised. These interfaces seem impressive in various respects, comically awful in many others. The promise of graphical user interfaces was an intuitive way to control complicated devices. Voice interfaces promise, sometimes implicitly, something more.

Voice interfaces seem to suggest the device to be controlled is now operating on a new level of intelligence, that it is approaching the level of an interlocutor. This is something encoded in films like Her, and in the ‘easter egg’ type responses to certain questions found in Siri, Cortana, and others. It’s important to point out that, despite such appearances, voice interfaces are just that — Siri, Cortana, Google Assistant: none of them understand you. A look at how linguistic communication works will explain why not.

Chomsky

A prevailing notion of language possession, one present in Chomsky’s theorising for example, is that speakers have a lexicon and knowledge (perhaps tacit) of the semantic properties of the words comprising this lexicon, as well as of procedures, innate grammatical rules, for the combination of these words.

Understanding in communication amounts to each interlocutor being privy to the same knowledge of words and procedures, hence knowing what their interlocutor means by knowing what the particular deployment of their words in the order they appear means. So, for instance, ‘John drove Mary to the Bank’ means that John drove Mary to the bank as opposed to Mary having driven John to the bank.

This is understood owing to the grammatical form of the sentence and the semantic properties of the words involved. This being the case, there would seem to be no reason in principle why a machine could not achieve the very same linguistic abilities as human beings.

Ambiguity, whether grammatical or lexical (to do with words), could serve to discredit this view. Using the same example again, for instance, we may be at a loss to know from the preceding sentence whether John was facilitating Mary’s financial requirements or taking her on a fishing trip, that is, which kind of ‘bank’ was involved.

We may also be confused as to whether Mary was John’s passenger in his automobile or whether, as shepherds might drive their flocks, she was herded by him to their destination. At any rate, these confusions are thought to be sorted out on the whole upon the basis of rather easily identifiable context conditions.

cgisf-tgg
A different issue in sense and nonsense

Interpretation

More often than not, we interpret an interlocutor’s reasons for saying what they do as far as we can discern them from their act of speaking. Among the desiderata for fully understanding a communicative offering, are a grasp of the literal meaning or meanings of a sentence and a sensitivity to context. What this sensitivity gives the listener is reasons for supposing their interlocutor to be meaning one thing over another.

Communication often entails novelty, as interpretation of seemingly new or even nonsensical utterances can make for sensible, rational communicative exchanges overall. Utterances can be ambiguous for various reasons — ambiguity, self-reference, context — among other things, that require attention to be paid to time, place, speaker intention and speaker context to determine their meaning. These are pragmatic considerations in the sense of utterances having to be considered as events at times and places, hence in understanding them one must take into account extra-linguistic factors.

Lexical ambiguity is displayed by words like ‘bank’, ‘sole’, ‘book’ and countless others as they can vary in meaning considerably. Grammatical ambiguity can be seen in sentences such as Groucho Marx’, “I shot an elephant in my pajamas. How he got into my pajamas I’ll never know.

In that these ambiguities belong to the language itself, they are ‘semantic’ ambiguities. Even when composed into grammatically well-formed sentences, ambiguities can remain. For instance, the following sentences all exhibit structural and lexical ambiguities::

I saw the Mourne Mountains flying to Cork
Students hate annoying professors
The first years were too hot to eat
Children must be carried on the escalators

We understand these and other such examples by interpreting what the speaker could mean, not necessarily primarily what the words in order suggest. Another example: “I have two Spitfires in the fridge.” Even if we don’t know that ‘Spitfire’ is a brand of ale, we will not likely suppose the speaker to be claiming to have WWII era aircraft somehow packed into the fridge. We know enough about the person, and the world, to be able to rule out as highly unlikely such meanings.

Sorry, I Cannot Take Requests Right Now

If we try to seek artificially intelligent interlocutors by way of basing a program or machine upon a lexicon and grammatical rules, we will not in principle be able to reproduce human-like communicative understanding as the gamut of human communicative understanding far outstrips those simple resources.

Rather than deducing from prior statements the scope of a present one and the likelihood of a subsequent statement. We ask ‘what could this person mean?’ and in answering we deploy all sorts of extra-linguistic resources. These include wondering about the situation of our communicative partner. Linguistic analysis, whilst certainly able to play a role in disambiguating superficially muddled sentence forms, nonetheless remains an instrument within a broader arsenal of communicative weaponry.

Understanding communicative speech requires the more subtle, dynamic and incomplete apparatus of interpretation in context. In effect, interpretation is regulated by norms of a different kind to those that can be defined as rules. They include the logical, but also the wider social, psychological, aesthetic and moral aspects of communicative exchanges. Each of these normative arenas (and more besides, this isn’t meant to be exhaustive) frequently, routinely, come into play in even the most banal communicative exchanges. In short: To be a communicator, one has to be an interpreter.

Reasons

The answer to the reflection, “Why would she say that?” calls upon a serious amount of interpretation. This cannot be reduced to routines nor proceduralised as it involves evaluating types of reasons. Moreover, the interpretive approach taken can appear, disappear, morph or scatter depending on details of the very exchange in which it emerges. It’s not rules, but hunches, heuristics, biases and insights that rule the day.

The interpretive basis for communicative understanding sketched here has the means to deal with novelty, nonsense, the banal and the sublime. This is due to the way in which it is precisely not tied to procedures and predefined contents, but to notions of reasons in thickly-conceived of action-contexts.

Alexa and other such technologies aren’t doing any of this. When you use voice control, your input is matched to a lexicon, by rules, and a response is generated based on grammatical and semantic (plus some heuristic) data. This is ingenious technology, for sure, but it is in principle a database searching algorithm. It is tempting for many to suppose that there is more to it. In various technical senses, depending on the technology, there can be more in terms of complexity. But at no stage is interpretation on the cards. Your Google Assistant will ‘decode’ your ambiguity. Cortana will sense your irony in terms of probability. Alexa will add your emergency to a shopping list.

It might be tempting for some, exciting for others, to think that technology has advanced to a stage of complexity such that artificial others are a real possibility. Voice interfaces can on one apparent level serve to bolster this position. But through a little analysis of how communication works, and how they do what they appear to do, the illusion fades.

Just as ancient civilisations looked to storms, the stars, and tides as signs of the gods, contemporary civilisations might look to the superficial call and response of machines as evidence of our own creative powers. In either case, it is hubris: the tides weren’t meant for the ancients, and Siri’s ‘Good Morning!’ is empty.

Leave a comment