There’s a good article today in the Financial Times, showing that, while ChatGPT can solve well-known puzzles (Monty Hall etc.), that’s because it has seen the solution, and it can’t even solve alpha-converted variants. The conclusion is good.
A computer that is capable of seeming so right yet being so wrong is a risky tool to use. It’s as though we were relying on a spreadsheet for our analysis (hazardous enough already) and the spreadsheet would occasionally and sporadically forget how multiplication worked.
Not for the first time, we learn that large language models can be phenomenal bullshit engines. The difficulty here is that the bullshit is so terribly plausible. We have seen falsehoods before, and errors, and goodness knows we have seen fluent bluffers. But this? This is something new.
From: John F Sowa
I received the following reply in an offline note:
Anonymous: ChatGPT is BS. It says what is most likely to come next in our use of language without regard to its truth or falsity. That seems to me to be its primary threat to us. It can BS so much better than we can, more precisely and more effectively using statistics with a massive amount of "test data," than we can ever do with our intuition regarding a relatively meager amount of learning.
That is partly true. LLMs generate a text that is derived by using probabilities derived from a massive amount of miscellaneous texts of any kind: books, articles, notes, messages, etc. They have access to a massive amount of true information -- more than any human could learn in a thousand years. But they also have a massive amount of false, misleading, or just irrelevant data.
Even worse, they have no methods for determining what is true, false, or irrelevant. Furthermore, they don't keep track of where the data comes from. That means they can't use information about the source(s) as a basis for determining reliability.
As I have said repeatedly, whatever LLMs generate is a hypothesis -- I would call it a guess, but the term BS is just as good, Hypotheses (guesses or BS) can be valuable as starting points for new ways of thinking. But they need to be tested and evaluated before they can be trusted.
The idea that LLM-based methods can become more intelligent by using massive amounts of computation is false. They can generate more kinds of BS, but at an enormous cost in hardware and in the electricity to run that massive hardware. But without methods of evaluation, the probability that random mixtures of data are true or useful or worth the cost of generating them becomes less and less likely.
Conclusion: Without testing and evaluation, the massive amounts of computer hardware and the electricity to run it is a massive waste of money and resources.
John