AI Linguistics

Is ChatGPT really that big a deal? - Yes it is!

Dec 5, 2023 | 4 min read | by Yugen Omer Korat

When we learn grammar at school we get drilled down with the old-fashioned idea that there are “rules” that define what is a “correct” sentence, but in reality, language is all over the place. To decipher its ambiguity, you need cognitive, social, and rhetorical reasoning that can only be acquired through noisy channels by trial and error. That’s very different from writing a computer program.

One of my favorite examples is that if you try to explain when it is appropriate to use the present simple tense (go on, try it before reading on), you might say it is when describing a habitual action. But it is also the tense used when casting a live game, which is about as far from habitual as possible. So which is it?

The field of linguistics that sorts out this mess is called “Pragmatics”, and it studies how language is used in practice, even when it goes against the "rules" we were taught in grade school. Pragmatics textbooks are full of examples of how many ways there are to say the same thing, how many interpretations a single sentence can have, and how context-sensitive language is. And trust me, it’s more than you think.

So, how do you know which form to use and what it means? Depends on the context! And as it turns out, building a system of rules that figures out the correct meaning of a sentence based on its context is a colossal task, much more than most freshmen initially imagine. Despite heaps of literature on the subject, no attempt has succeeded in algorithmizing a way that allows us to generate coherent discourse out of a set of formal rules.

Enter the first word embeddings, which you all recognize, for example, from Gmail’s good ole’ autocomplete feature. However, you will notice that such technologies failed to give us the holy grail of fully coherent discourse with all of its intricacies. At best, they could capture slices of it, some of the time, but even the best chat models were typically more trouble than they were worth. Until now, that is.

ChatGPT hasn’t been a paradigm shift. Transformers have existed since 2017, and the generation component built on top of them is simply iterative argmax over logits, which is a well-known technique. However, the idea of training a 40B parameter model is a huge shot in the dark because there is no way to know for sure if the number of parameters is the bottleneck, and as far as we knew, it could have been another vain attempt.

Unfortunately, the only way to truly grasp the full significance of this contribution is by banging your head against the wall that is natural language pragmatics and realizing firsthand how vast the number of considerations is. There is no way I can possibly do it justice, so until you’ve done that yourself, you’ll have to take my word for it :)

In future posts, though, I will introduce ideas from linguistic theory which are relevant to designing more effective AI systems, communicating with them, and interpreting their output, a subject which is somewhat neglected by today’s discourse despite its importance. These ideas can give you an idea of the challenges posed by context dependence, and hopefully, I’ve kindled a spark of curiosity for some of you to learn even more!

In the next post, we’ll talk about a unique application of ChatGPT for text mining, and specifically, using a well established tool from theoretical linguistics to convert free text into tabular format, where each column stands for a different semantic component of the sentence.

The whole series on AI in Linguistics

by Yugen Omer Korat

Yugen is a co-founder and CTO of Marvin Labs. He was a postdoctoral researcher at Háskóli Íslands University, holds a PhD in Computational Linguistics from Stanford University, and MA and BA from Tel Aviv University.