Michael,
The examples you cite illustrate the strengths and weaknesses of LLMs. They show why
multiple methods of evaluation are necessary.
1. The failures mentioned in paragraph 1 show that writing a program requires somebody or
something that can understand a problem statement and generate a sequence of commands (in
some detailed notation) to specify a method for solving that problem. LLMs can't do
that
2. The second paragraph shows that ChatGPT had a better selection of answers available in
May or perhaps an improvement in its ability to find answers. It's possible that very
few dental clinicians had ever used ChatGPT for that purpose. You experiment and the work
by the dental clinicians in India may have added enough new patterns that dental
clinicians worldwide would have benefitted.
3. The third paragraph shows how ChatGPT learns how to do what it does best: translate
from one notation to another. Since you did all the problem analysis to generate Python
with miscellaneous errors, it learned how to translate your personal dialect of Python to
the official Python syntax. That is an excellent example of LLMs at their best. It was
learning how to translate, not learning how to understand.
4. I would say that there is a major difference. Wikipedia is not improved by any method
of learning (by humans or machines). Instead, some articles are excellent products of
collaboration by experts on the subject matter. But other articles were written hastily
by people who don't have the expertise or the patience to do a thorough research of
the topic. The Wikipedia editors usually mark those articles that require further
attention. But there are many articles that fall between the cracks -- nobody knows
whether they are accurate or not.
John
----------------------------------------
From: "Michael DeBellis" <mdebellissf(a)gmail.com>
[Paragraph 1] I agree. I've asked ChatGPT and Copilot for SPARQL queries, not
extremely complicated, either things I thought I would attempt rather than going back to
the documentation or in some cases to get DBpedia or Wikidata info because I find the way
they structure data to be not very intuitive and it takes me forever to figure out how to
find things like all the major cities in India (if anyone knows some good documentation on
the DBpedia or Wikidata models please drop a note). I think part of the problem is that
people see what looks like well formatted code and assume it actually works. None of the
SPARQL queries I've ever gotten worked.
[2] We did an experiment in February this year with dental clinicians in India where we
gave them a bunch of questions and had them use ChatGPT to get answers and they rated the
answers very highly even though almost all of them were incomplete, out of date, had minor
or major errors. On the other hand, when I ran the same questions through ChatGPT (and in
both cases I used 3.5) in May it was radically different. Almost all the answers were spot
on.
[3] And for coding, I have to say I find the AI support in PyCharm (my Python IDE) to be
a great time saver. Most of the time now I never finish typing. The AI can figure out what
I'm doing by figuring out patterns and puts the suggested completion in grey and all I
do is hit tab. It's also interesting how it learned. My code is fairly atypical
Python, because it involves manipulating knowledge graphs and at first I was getting
mostly worthless suggestions. But after a few days it figured out the patterns to read and
write to the graph and it has been an incredible benefit. I like it for the same reason I
always copy and paste names whenever I can rather than typing, it drastically cuts down on
typing errors.
[4] All this reminds me of the debates people had about Wikipedia. Some people thought it
was worthless because you can always find some example of vandalism where there is garbage
in an article. And other people think it is the greatest thing on the Internet. The answer
is somewhere in the middle. Wikipedia is incredibly useful and also an amazing example of
how people can collaborate just to contribute their knowledge, the way people collaborate
on that site is so different than most of the Internet, but you should never use it as a
primary source. Always check the references. That's the way I feel about Generative
AI. Like Wikipedia I think it is a great resource in spite of the fact that some people
claim it can do much more than it really can and that it can still be wrong. It's just
another tool and if used properly, an incredibly useful one.
Michael
https://www.michaeldebellis.com/blog
On Saturday, July 27, 2024 at 4:41:24 PM UTC-7 John F Sowa wrote:
Another of the many reasons why Generative AI requires other methods -- such as the 70
yeas of AI and computer science -- to test, evaluate, and correct anything and everything
that it "generates",
As the explanation below says, it does not "UNDERSTAND" what it is doing It
just finds and reproduces patterns that it finds in its huge volume of data. Giving it
more data gives it more patterns to choose from. But it does nothing to help it
understand any of them.
This method enables it to surpass human abilities on IQ tests, law exams, medical exams,
etc. -- for the simple reason that the answers to those exams can be found somewhere on
the WWW. In other words, Generative AI does a superb job of CHEATING on exams. But it is
hopelessly clueless in solving problems whose solution depends on understanding the
structure and the goal of the problem.
For similar reasons, the article mentions that self-driving cars fail in complex
environments, such as busy streets in city traffic. The number and kinds of situations
are far more varied and complex than anything they have been trained on. Carnegie Mellon
University is involved in more testing of self-diving cars because Pittsburgh has the most
complex and varied patterns. It has more bridges than any other city in the world. It
also has three major rivers, many hills and valleys, steep winding roads, complex
intersections, tunnels, foot traffic, and combinations of any or all of the above.
Drivers who test self-driving cars in Pittsburgh say that they can't go for twenty
minutes without having to grab the steering wheel to prevent an accident. (By rhe way, I
learned to drive in P:irravurgh. Then I went to MIT and Harvard,, where the Boston
patterns are based on 300-year-old cow paths.)
John