Demonstrating Why AI Can’t Do High
Surprise, Surprise. With all the excitement and anxiety about AI soon matching or exceeding human intelligence, it turns out that AI has made very little progress since the 1950s.
ChatGPT and similar chatbots based on the latest and greatest Large Language Models still don’t get it—they fail the semantic ambiguity test.
Chatbot sign for support service concept.
In my previous post about Yehoshua Bar-Hillel, a machine translation pioneer in the early 1950s, I recounted the stories he told about taming lions and computers at the First Annual Symposium of the American Society for Cybernetics in 1967. In his talk, Bar-Hillel also listed three requirements for “machine intelligence”: The ability to manipulate language; having background knowledge about the world; and reasoning and computing abilities, all at the level of a high school graduate. The required effort to achieve these prerequisites of artificial intelligence, Bar-Hillel said, “would be incomparably greater than that required for putting man on Venus.”
Fifteen years earlier, in his opening address to the machine translation conference which he organized at MIT in June 1952, Bar-Hillel was much more optimistic: “Even if it should turn out that none of the possible machine-brain partnerships would be more effective than a human translator, in the sense that they will be neither quicker nor cheaper nor more exact than the human translator, under conditions existing today or in the near future, I would strongly advocate a continuation of this research. Electronic machines will doubtless become cheaper, human brains probably more expensive.”
By 1955, however, Bar-Hillel turned very pessimistic. In “A Demonstration of the Nonfeasibility of Fully Automatic High-Quality Translation” he wrote that high-quality translation by computers “is just a dream that will not come true in the foreseeable future.” The machine translation researchers that do not recognize the “practical futility of this aim,” wrote Bar-Hillel, misled their sponsors by “not being satisfied with a partly automated translation system whose principles are well understood today,” instead asking them to wait for “the real thing which was believed, and made to believe, to be just around the corner.”
Just around the corner, or as OpenAI said in a recent announcement, superintelligence AI, “the most impactful technology humanity has ever invented,” could arrive this decade and “could lead to the disempowerment of humanity or even human extinction.”
The example Bar-Hillel used in his paper to demonstrate the futility of pursuing the dream of high-quality machine translation was the following sentence:
The box was in the pen.
And here is the linguistic context from which this sentence is taken:
Little John was looking for his toy box. Finally he found it. The box was in the pen. John was very happy.
Bar-Hillel explains this example of semantic ambiguity:
Assume, for simplicity's sake, that pen in English has only the following two meanings: (1) a certain writing utensil, (2) an enclosure where small children can play. I now claim that no existing or imaginable program will enable an electronic computer to determine that the word pen in the given sentence within the given context has the second of the above meanings, whereas every reader with a sufficient knowledge of English will do this "automatically."
In discussing proposed solutions by other machine translation researchers, Bar-Hillel says: “It is an old prejudice, but nevertheless a prejudice, that taking into consideration a sufficiently large linguistic environment as such will suffice to reduce the semantical ambiguity of a given word.”
Well, that was the state of natural language processing (NLP) in the 1950s. I was absolutely sure, however, that today’s Large Language Models (LLM), trained on an untold number of texts and capable of (parallel) processing them, would get the “linguistic context” and would always get it right.
To better understand the state of machine translation BC (Before ChatGPT), I asked Google Translate (which I assume is not supported right now by the latest LLM) to render Bar-Hillel’s example in two languages, Hebrew and Russian. Google Translate translates “the pen” in the sense of a writing instrument—in both languages. “The box was in the writing instrument” is what was lost in translation.
But what about the recent “revolution” in what AI can do now (and what AGI will do “just around the corner”)?
Bard, Google’s answer to ChatGPT, was just slightly “better” than Google Translate. For Bard, the box was in the pen in the sense of an animal enclosure.
I provided Bard with the full context for the sentence from Bar-Hillel’s paper but still got “animal enclosure.” When I protested, Bard helpfully provided this:
That's correct. The Hebrew word מכלאה (mikhla'ah) means "enclosure, pen, fold". It can be used to refer to a small enclosure for animals, such as a chicken coop or a pigpen. It can also be used to refer to a larger enclosure, such as a zoo or a wildlife sanctuary. In the context of your sentence, the word מכלאה is being used to refer to a small enclosure for animals. The box was found in this enclosure, which is why the translation uses the Hebrew word מכלאה.
The use of “pen” as small storage space (it is the second sense of the word in the Merriam-Webster dictionary, with writing instrument listed as the fourth sense), and specifically for children’s toys, is clear from the full linguistic context which I provided to Bard. Still, thinking that maybe this sense was more prevalent in the 1950s than today and that maybe Bard was lazy and looked only at recent texts, I asked Bard for examples from literature of “pen” as “an enclosure where small children can play.”
In seconds, I got three images, the covers of The Secret Garden, Little Women and The Adventures of Tom Sawyer, and relevant quotes from these books. And this explanation:
In these examples, the word "pen" is used to refer to a small, enclosed area where children can play. The pen is usually surrounded by a fence or some other type of barrier, and it is often filled with toys, games, or other activities that children enjoy.
The knowledge is there but Bard did not connect the dots.
Why? Here’s what Bar-Hillel said in 1955:
What makes an intelligent human reader grasp this meaning so unhesitatingly is…. his knowledge that the relative sizes of pens, in the sense of writing implements, toy boxes, and pens, in the sense of playpens, are such that when someone writes under ordinary circumstances and in something like the given context, "The box was in the pen," he almost certainly refers to a playpen and most certainly not to a writing pen. This knowledge is not at the disposal of the electronic computer and none of the dictionaries or programs for the elimination of polysemy puts this knowledge at its disposal.
Today’s AI still doesn’t have knowledge of the world at the level of a three-year-old. LLMs are a small step for man, but are far from a giant leap for mankind.
One man, Terry Winograd, made a small step in the late 1960s on the very long and difficult journey of natural language processing. Like Bar-Hillel, within a few years after taming a computer to participate in a conversation, he became a “high-profile deserter from the world of AI,” in the words of John Markoff.
In the next post, I will discuss Winograd and the Winograd Schema Challenge, a new “Turing test” designed to check AI’s competence in dealing with semantic ambiguity and how much today’s AI programs understand—or don’t understand—the world. Can ChatGPT pass the test?