Jon,

Thanks for that link. It shows why hopes that LLMs will magically lead to AGI (some kind of intelligence that competes with or goes beyond the human level) are hopelessly MISGUIDED. For passing a math test, they can get an A+ if they're lucky enough to find the answers in their petabytes of random stuffing. But if they can't find a correct solution, they're lucky to earn a C. Even worse, the LLMs are so stupid that they can't say whether their results are good, bad, or indifferent.

The major strength of generatve AI technology is in providing an English-like (more generally a natural-language like) interface to the AI reasoning technology of the past 60 years. That is extremely valuable, since the complex reasoning methods of GOFAI (Good Old Fashioned AI) require years of studying to learn and use correctly.

But the hope that devoting billions of $$$ of computer horse power will produce AGI is hopelessly misguided. A good state-of-the-art laptop with GOFAI and a modest amount of LLM processing can outperform the biggest and most expensive LLM systems on the planet. And it will do so with guaranteed accuracy. If it can't solve a problem, it will say so. It won't produce garbage and claim that it's accurate.

Following Jon Awbrey's note is an excerpt that I extracted from the link that Jon cited.

John

From: "Jon Awbrey" <jawbrey@att.net>

John, Alex, ...

I haven't found a use myself for the new spawn of chatbots but

the following is typical of reports I read from those who do

attempt to use them for research and not just entertainment.

Peter Smith • Another Round with ChatGPT

• https://www.logicmatters.net/2024/06/02/another-round-with-chatgpt/

Cheers,

Jon

_____________________________

Another round with ChatGPT

By Peter Smith / This and that / 4 Comments / June 2, 2024

ChatGPT is utterly unreliable when it comes to reproducing even very simple mathematical proofs. It is like a weak C-grade student, producing scripts that look like proofs but mostly are garbled or question-begging at crucial points. Or at least, that’s been my experience when asking for (very elementary) category-theoretic proofs. Not at all surprising, given what we know about its capabilities or lack of them.

But this did surprise me (though maybe it shouldn’t have done so: I’ve not really been keeping up with discussions of the latest iteration of ChatGPT). I asked — and this was a genuine question, hoping to save time on a literature search — where in the literature I could find a proof of a certain simple result about pseudo-complements (and I wasn’t trying to trick the system, I already knew one messy proof and wanted to know where else a proof could be found, hopefully a nicer one). And this came back:

So I take a look. Every single reference is a total fantasy. None of the chapters/sections have those titles or are about anything even in the right vicinity. They are complete fabrications.

I complained to ChatGPT that it was wrong about Mac Lane and Moerdijk. It replied “I apologize for the confusion earlier. Here are more accurate references to works that cover the concept of complements and pseudo-complements in a topos, along with their proofs.” And then it served up a complete new set of fantasies, including quite different suggestions for the other two books.

[Following this example are two more paragraphs by Peter Smith and a few notes by other readers who had similar experiences,]