AI Linguistics

I Know What You Mean

Jan 30, 2024 | 13 min read | by Yugen Omer Korat

In my first article, I mentioned natural language pragmatics as the field studying how speakers resolve context dependency, and figuring out how we understand each other despite how messy language is. In this post, we will look at some specific examples and how accurately ChatGPT can actually handle them. This will serve a double purpose: give you some idea of what challenges speakers face when interpreting natural language, and find out how well LLMs tech can deal with them.

Before we start, I’d like to make a quick note that in our current reality, these examples might seem almost trivial, but I assure you that this achievement is built on top of decades of failed attempts, due to the extremely high level of parameters and level of complexity of the task. The goal here is to try to appreciate some examples of exactly what modern LLMs achieve that wasn’t possible until recently.

Let’s kick it off with one of the most commonly used examples in the pragmatics literature, the versatility of the phrase do the/a X. Depending on what noun phrase X is, the phrase can have any number of meanings, even ones you haven’t considered before (called ad-hoc or nonce sense), but speakers don’t seem to even take a pause when trying to interpret it. Let’s see what ChatGPT has to say about it. (Forgive the imperfect formatting, this is the best I can do with LinkedIn's interface)

Do the lawn/laundry/dishes: This phrase is a common way of referring to performing household chores.
Do the lawn: This typically means to take care of the lawn, which can include mowing, weeding, watering, or other maintenance tasks.
Do the laundry: This refers to the process of washing, drying, and often folding clothes and other textiles.
Do the dishes: This involves washing, drying, and putting away dishes, cutlery, and cooking utensils after meals.
Do a Napoleon for the camera: This phrase is less common and more idiomatic. It refers to posing for a photograph in a manner reminiscent of Napoleon Bonaparte, the famous French military leader and emperor. A typical "Napoleon pose" might involve placing one hand inside the coat or across the chest, a pose that Napoleon is often depicted as having in paintings and statues. This expression suggests striking a dramatic, historical, or authoritative pose when being photographed.

The important thing to notice is that in order to interpret these phrases one needs to have some knowledge about the world, and also about how language behaves, and tie the two together to form a coherent meaning. No one is surprised that C (ChatGPT for short) is able to do it, but I do encourage you to take a note of how specific and accurate his explanation of the Napoleon pose is, including its social and cultural connotations.

But this is too easy, right? To crank the challenge level higher, I went to one of my all-time favorite papers, “Making Sense of Nonce Sense” (1973) , by Herbert Clark, where he cites many examples of ad-hoc uses of language which break the standards, and asked C to explain them to me. Here is what I got:

"The bank's buzzier guessips": (Y: I realize this kind of wordplay might sound a bit cringy nowadays, but look again at the year of publication. And it does the job just fine of demonstrating ad hoc language.) This phrase seems to refer to people at the bank who are involved in gossip or rumors. "Buzzier" might imply that these individuals are more active in spreading or discussing rumors. "Guessips" is likely a play on words combining "guess" and "gossips," indicating speculative talk. [...] "buzzier" implies that these individuals are more engaged in the lively exchange of speculative information or rumors, creating a more dynamic or 'noisy' atmosphere of gossip.
"Gold plunges to new lows": This is a financial news headline. It means that the price of gold has fallen significantly, reaching lower levels than previously recorded.
“I stopped by Perry's for a quick crab": This phrase suggests that the speaker visited a place (possibly a restaurant or eatery named "Perry's") to have a quick meal or snack, specifically crab.

Again, notice how specific the explanations are. It requires simultaneous familiarity with both the abstract properties of the concepts being used (gossip, plunging, speed, etc’), and the real-world context of the situation. The fact that generating buzz (or hype, in more contemporary language) may take busywork or create noise. that gossip might involve guesswork, that gold is strongly characterized by its price and that prices can go up and down, and that crab is often consumed as food. It’s much like a combinatorial mix-and-match: you have to project two concepts in exactly the right magnitude and direction so that they are able to combine with each other in a way that matches what we know about the world. So for example, in the case of quick crab, “quick” has to be projected in the dimension of “preparation speed” and “crab” in the dimension of “a dish”.

If this sounds confusing, I’ll give you a small primer: in LLMs, the vector that represents a piece of text encodes information about its value along many dimensions measured separately. The unique meaning of the text results from the combination of these values, but we can isolate aspects of the meaning by looking at a combination of a subset of these dimensions. This is an over simplification, but you can imagine that some 3 combination of dimensions, one responsible for encoding concepts that have to do with altitude, another that encodes chemical concepts, and yet another that encodes financial ones. And when the model realizes that the string “gold plunges to new lows” is unlikely to describe altitude, it is projected into the financial domain, it is as if the model isolates the dimensions responsible for encoding altitude and projects them into the space defined by the vectors encoding financial concepts, and this is how we get the idea that the price went down.

I realize that this explanation is far from satisfactory, but going into any more detail would stray too far from the goal of this article. If you want to learn more, there’s content coming up about the representation of metaphors in vector space models, in which I go deeper into the idea of cross-domain projection.

But even with a cursory understanding of the issue, I’m pretty sure you realize how complex this task is when considered in terms of the number of dimensions to choose from, the magnitude by which each concept has to be projected along its dimension, and the amounts of real-world information the resulting new meaning has to be compared against to evaluate it. This is a combinatorial nightmare, which is why it is really quite remarkable that the machine that achieves it was created the way it was, without any facts or knowledge being explicitly introduced beyond the text itself.

As discussed in my first article, there are far too many such nuances to language and the world to be able to hard code or enumerate them. Making these connections requires a model with strong generalization abilities trained on substantial amounts of data, which is what we have here.

Interestingly, note that it didn’t get the wordplay behind “buzzier” right in its first attempt (“busy” + “buzzy”), which shows how non-trivial this task is. It did get it right when I asked more specifically.

In my attempts to challenge C, I scoured “Defining Pragmatics” (2010) by Mira Ariel, a classic pragmatics textbook that cites many conversation snippets which exemplify interesting inferences people make when using language. Here is what I found (page 62).

Prompt: I will give you a dialogue snippet. explain to me what is the implicit meaning of "we don't have laborers", what is inferred from it:

M: You don’t build by yourselves

S: No, we don’t have laborers, we have 10, 15 laborers (Lotan: 7).

Response: The implicit meaning of "No, we don’t have laborers, we have 10, 15 laborers" in this context seems to suggest a contradiction. Initially, the statement "we don't have laborers" implies a lack of workers. However, the follow-up clarification of having "10, 15 laborers" contradicts this, indicating that they do have some labor force, albeit perhaps not as many as might be expected or required for the work being discussed.

Compare to Ariel’s paraphrase: “we don’t have enough laborers to build by ourselves”. Notice also C’s reasoning process, which is very reminiscent of how linguists describe the process of inference, as bending the standard meaning of a phrase in the smallest amount that accounts for everything that was said. That is, we have to bend the meaning of “we don’t have laborers” in the smallest magnitude possible which still accounts for the fact that they have 10–15 laborers. This is the same idea as projection along the right dimensions I discussed above.

There are, though, examples of C missing the intended meaning. These are few and far between, but I did recently find a decent example of one:

Q: what is a list sales

A: A "list sale" generally refers to the sale or rental of a mailing list.

I was thinking of A-list sales, but it didn’t occur to me to capitalize the “A”, and apparently it wasn’t obvious enough what I meant. But this is easily a misunderstanding that could happen between humans as well, so the fact that this is my best example of failed pragmatics with C means a lot. I know there are lots of examples online of awkward AI moments, though, but from my experience these are negligible exceptions in my overall flow of communication with LLMs. Not to mention that many of them are the result of attempts to intentionally break C, which is not what I am concerned with here. But I haven’t seen a quantitative analysis of these, so take it with a grain of salt.

To wrap things up, I also challenged C with a passage from the science fiction novel "The Towers of Titan" (Ben Bova, 1961), which represents the more nuanced and metaphorical sort of implicatures. I asked C to break down their pragmatics (= implicitly communicated information), and here are the results for you to draw your own conclusions from. Consider writing down the implicatures from this passage before reading C’s responses and then compare.

Bennett's smile broke into an open grin. "Self-pity is a terrible thing. Do you know what those kids think of you? You're Dr. Sidney Lee, the foremost xeno-anthropologist of the human race.

Response:

Contrast Between Perception and Self-Image: Bennett's comment suggests a disparity between how Dr. Lee perceives himself (possibly with self-pity) and how he is admired by others (viewed as a leading figure in his field).
Recognition and Esteem: Dr. Sidney Lee is highly regarded in his professional field, which is acknowledged by Bennett and contrasts with Lee's own potential feelings of self-pity.
Encouragement and Reassurance: Bennett's statement serves as a form of encouragement or reassurance to Dr. Lee, perhaps to lift his spirits or change his perspective about himself.

In conclusion, we’ve looked at some examples of actual language use that require a combination of linguistic and world knowledge to be understood, and cannot be broken down into some elegant set of rules. You have to really be familiar with all of it so that you wouldn’t be surprised by whatever novel combination of concepts is thrown at you, like “guess” and “gossip”. Moreover, you have to be able to project concepts across domains, so that you find their interpretation that makes the most sense, like altitude and precious metals. To us it might seem trivial, but that’s because we have grown up solving these problems on a daily basis until they’ve become part of our nature. But from a computational point of view, it wasn’t clear until recently whether such problems can even be solved with the algorithms we have and with access to text data alone, due to the complexity of the abstractions it requires.

In future articles, we’ll take a closer look at the math behind vector space models and how it might give rise to this kind of understanding. But in the meantime, enjoy this excerpt from “The Scarlett Letter” (Nathanieal Hawthorne, 1850) followed by C’s understanding of it:

Literature, its exertions and objects, were now of little moment in my regard. I cared not, at this period, for books; they were apart from me. Nature,—except it were human nature,—the nature that is developed in earth and sky, was, in one sense, hidden from me; and all the imaginative delight, wherewith it had been spiritualized, passed away out of my mind. A gift, a faculty if it had not departed, was suspended and inanimate within me. There would have been something sad, unutterably dreary, in all this, had I not been conscious that it lay at my own option to recall whatever was valuable in the past. It might be true, indeed, that this was a life which could not with impunity be lived too long; else, it might have made me permanently other than I had been without transforming me into any shape which it would be worth my while to take. But I never considered it as other than a transitory life. There was always a prophetic instinct, a low whisper in my ear, that, within no long period, and whenever a new change of custom should be essential to my good, a change would come.

Response:

Change in Interest and Values: The narrator's interest in literature and books has diminished, indicating a shift in their personal values or life focus.
Emotional State: There is an underlying sadness or sense of loss, suggested by the lack of interest in what used to bring joy or spiritual fulfillment.
Disconnection from Nature: The narrator feels disconnected from nature, except for human nature, implying a sense of alienation or loss of connection with the external world.
Inner Potential and Suspended Faculties: The mention of a "gift" or "faculty" being suspended indicates that the narrator recognizes an inner potential or ability that is currently inactive or unutilized.
Self-awareness and Control: The narrator is aware that their current state is transitory and believes they have the power to change or recall past values when necessary.
Sense of Inevitability About Change: There's an implication of inevitability regarding a change in lifestyle or mindset, guided by a "prophetic instinct" or internal guidance.
Temporal Perception of Current State: The life being described is seen as temporary, not a permanent transformation of the narrator's identity.
Consideration of Life Choices: The narrator reflects on the sustainability of their current way of life, suggesting a contemplation of long-term consequences.
Anticipation of Positive Transformation: There is an anticipation or hope for a positive change that aligns better with the narrator's well-being or personal growth.

The whole series on AI in Linguistics

by Yugen Omer Korat

Yugen is a co-founder and CTO of Marvin Labs. He was a postdoctoral researcher at Háskóli Íslands University, holds a PhD in Computational Linguistics from Stanford University, and MA and BA from Tel Aviv University.