Amit and anybody who did or did not attended today's talk at the Ontology Summit session,
All three of those questions below involve metalevel issues about LLMs and various reasoning issues with and about generative AI. The first and most important is about anything generated by LLMs: Is it true, false, or possible? After that are How? Why? and How likely?
The biggest limitation of LLMs is that they cannot do any reasoning by themselves. But they can often find some reasoning by some human in some document from somewhere. If they find something similar, they can apply it to solve the current problem. But the word 'similar' raises critical questions: How similar? In what way is it similar/ Is that kind f similarity relevant to the current question or problem?
For example, the LLMs trained on the WWW must have found textbooks on Euclidean geometry. If some problem is stated in the same terminology as the books on geometry, the LLMs might find an answer and apply it.
But more likely, the problem will be stated in terms of the subject matter, such as building a house, plowing a field, flying an airplane, or surveying the land rights in a contract dispute. In those cases, the same geometrical problem may have few or no words in common with Euclid's description of the geometry and the terminology of each of the applications.
For these reasons, a generative AI system, by itself, is unreliable for any mission-critical application. It is best used under the control and supervision of some system that uses trusted methods of AI and computer science to check, evaluate, and supplement whatever the generative AI happens to generate.
As an example of the kinds of systems that my colleagues and I have been developing, see https://jfsowa.com/talks/cogmem.pdf , Cognitive Memory For Language, Learning, and Reasoning, by Arun K. Majumdar and John F. Sowa.
See especially slides 44 to 64. They show three applications for which precision is essential. There are no LLM systems today that can do anything useful with those applications or anything similar. Today, we have a new company, Permion.ai LLC, which has developed new technology that takes advantage of BOTH LLMs and the 60+ years of earlier AI research.
The often flaky and hallucinogenic LLMs are under the control of technology that is guaranteed to produce precisely controlled reasoning and evaluations. Metalevel reasoning is its forte. It evaluates and filters out whatever may be flaky, hallucinogenic, or inconsistent with the given facts.
John
There has been a lot of discussion on LLMs and GenAI on this forum.
I would like to share papers related to three major challenges:
1 Is it Human or AI? d
Counter Turing Test CT^2: AI-Generated Text Detection is Not as Easy as You May Think —
Introducing AI Detectability Index
2. Measuring, characterizing and countering Hallucination (Hallucination Vulnerability Index)
The Troubling Emergence of Hallucination in Large Language Models –An Extensive Definition, Quantification, and Prescriptive Remediations
3. Fake News/misinformation
FACTIFY3M: A Benchmark for Multimodal Fact Verification with Explainability through 5W Question-Answering
Introduction/details/links to papers (EMNLP 2023):
I think this community won’t find this perspective alien:
Data driven only approaches can’t/won’t address these challenges well—
Knowledge (including KGs/ontologies/world model/structured semantics) and
domain specific, etc) will play critical role in
addressing these. The same goes for three of the most important requirements
(knowledge will play a critical role in making progress on these):
grounding, intractability, and alignment.
More to come on this from #AIISC.
Cheers,
Amit
_______________________________________________