Following is a slightly edited copy of a note with that title. As I have been saying,
hybrids that use symbolic methods are essential for evaluating and correcting the results
generated by LLMs.
One way to limit the amount of errors and hallucinations is to limit the amount of text
that is being processed by the LLMs. But that contradicts people like Elon M. who has
been using a huge amount of Nvidia chips, time, electricity, cooling, and $$$.
Elon is processing larger and larger amounts of input data, but he is also getting larger
and larger amounts of irrelevant data as well as garbage and propaganda generated by
people who are dumping the worst kind of stuff into the WWW.
Evaluation by symbolic methods is essential. There are people in Kenya who are being
traumatized by companies that hire them to find and flag the garbage. They need the
money, but it would be better to use symbolic AI methods that can detect the garbage
without turning poor people into lunatics. The Kenyan people have nightmares, and some
have committed suicide after spending 8 hours a day, every day reading that stuff.
And the fact that they need to hire poor people shows that LLMs, by themselves, cannot do
the evaluation. That is a task that our Permion Inc. company does by using symbolic AI
methods for evaluation. For examples, see the methods that our purely symbolic VivoMind
system did from 2000 to 2010:
https://jfsowa.com/talks/cogmem.pdf
You can skip to the final section for three examples of applications by the old VivoMind
methods. Those are applications that customers specified and paid for. None of them can
be done today by LLMs. But the new Permion hybrid methods can do all three kinds:
symbolic, neural, and neurosymbolic.
The Permion and VivoMind methods can also do multiple languages, including language
switching in the middle of a sentence. That is important for processing technical writing
that mix human languages with texts from all kinds of notations for science, engineering,
and all kinds of products and system.
One of the VivoMind applications dealt with texts about organic chemistry, which mixes
symbols for millions of molecules with English text. Another application mixes computer
jargon and programming langages with English text. And the system has to relate
information in multiple computer languages with software specifications and application
data. All that was done from 2000 to 2010.
John
----------------------------------------
From: "Anand Sanwal" <anand.sanwal(a)cbinsights.com>
AI agent problems
March 18, 2025
Talk is cheap
Hi there.
As AI agents dominate the conversation, customers are growing skeptical about whether they
can live up to the hype.
In March, we’ve interviewed 40+ customers of AI agent products and are hearing of 3
primary pain points right now:
- Reliability
- Integration headaches
- Lack of differentiation
1. Reliability
This is the #1 concern raised by organizations adopting AI agents, with nearly half of
respondents citing reliability & security as a key issue in a survey we conducted in
December.
According to CBI’s latest buyer interviews, AI agent reliability varies dramatically
across providers. Many customers report a gap between marketing and reality.
"Whatever was promised didn't work as great as said," one LangChain user
told us about the company’s APIs. "We encountered cases where we were getting
partially processed information, and the data we were trying to scrape was not exactly
clean or was hallucinating."
For many customers, reliability is largely a function of how complex the data and use
cases are. For instance, the LangChain customer saw ~80% accuracy for simpler tasks, but
“for complex tasks, the accuracy dropped to around 50%.”
Organizations are tackling the reliability issue with 1) human oversight; and 2) more
extensive model training.
An Ema customer, for instance, first has a subject-matter expert review outputs, and once
“more than 90% of the responses that we have tested are now accurate, we let it fly.”
A customer for CrewAI, which orchestrates teams of AI agents into “crews,” takes an even
more involved approach:
The customer still needs to intervene with their own ML algorithms when CrewAI is unable
to handle outliers or unconventional data structures. If CrewAI is able to tackle these
cases in the future, “that would be a huge leap forward.”
2. Integration headaches
Integration limitations rank as another top customer pain point.
For one, lack of interoperability poses long-term challenges, as this Cognigy customer
notes:
An Artisan AI customer echoes this: “It was a bit of a gamble that we were signing up for
a product where they didn't have quite all the integrations that we wanted.”
Where customers see real value from these tools is when they can support seamless data
flow, especially through customers' existing tech stacks. This buyer went with Decagon
because of its integrations:
3. Lack of differentiation
More than half of private capital flowing into the AI agent space has gone to horizontal
applications — but these markets, like customer support and coding, are becoming highly
saturated.
"There's so many short-term moats, but in the long term there is no moat,"
one customer observed. "Whatever you build will be rapidly reproduced."
In a crowded market, specialization will determine success.
Hebbia, for instance, has tailored its solution to financial players. An exec at a PE firm
framed this as a selling point when getting internal buy-in: “When I bring tools to the
deal team that live and breathe diligence and deal execution, ensuring that it's
aligned to what they know and understand and [that it] speaks their language is incredibly
important.”
While many horizontal AI agents are actively deploying or even scaling their solutions,
vertical AI agents remain nascent, with half still in the first 2 levels of Commercial
Maturity.
They’ll gain more momentum this year as enterprises prioritize solutions that are highly
tailored to the needs of individual industries.
TLDR — Tech loves drama, right?
Here's a roundup of recent tech drama:
- Let’s make a Deel: HR and payroll platform Rippling is suing its rival Deel for
allegedly planting a spy to access Rippling’s internal sales pipeline data and customer
interactions. Rippling CEO Parker Conrad confirmed they set up a “honeypot” to prove that
Deel’s senior leadership was orchestrating the illegal activity — and the double agent
fell for it. When served with a court order, the spy locked himself in a bathroom and
allegedly tried to flush his phone down the toilet. Rippling is seeking damages.
- Huawei in hot water: Belgian police arrested multiple individuals in a corruption probe
involving forgery and falsified documents linked to Chinese tech giant Huawei. Authorities
believe lobbyists paid off European Parliament members with cash, expensive gifts, and
luxury trips to promote Huawei’s business interests in the region. Huawei denies
wrongdoing but insists it is taking the allegations “seriously."
- Dirty laundry: Indian authorities arrested Lithuanian national Aleksej Besciokov,
co-founder of Russian crypto exchange Garantex, at the request of the US. Besciokov is
accused of facilitating money laundering linked to North Korean hackers and other
cybercriminals. The arrest follows the US government’s seizure of Garantex’s website and
$26M in frozen assets. Garantex, now under fire, claims it has plans to compensate
customers for blocked funds.
- Lilac claps back: Former employees of lithium tech startup Lilac Solutions sued the
company, claiming exposure to toxic chemicals left them with severe health issues. The
Breakthrough Energy Ventures-backed firm fired back with its own lawsuit, accusing the
workers of leaking trade secrets. OSHA has already hit Lilac with multiple citations, and
a legal battle is now brewing over safety, whistleblower retaliation, and IP theft. Oh
my.
- Apple vs. UK: Apple is fighting a UK order demanding that it build a “back door” into
its security systems. Privacy activists are suing the government, calling the demand a
major privacy violation. Apple already pulled its iCloud Advanced Data Protection from the
UK after getting a secret government order. Now, US lawmakers are jumping in, pushing the
UK to be more transparent about its legal process.
- Scalpel scandal: AI imaging firm ChemImage is suing Johnson & Johnson, claiming the
healthcare giant stole its tech after a multibillion-dollar partnership went south.
J&J argues the contract was scrapped because ChemImage failed to meet key milestones,
while ChemImage says J&J bailed to cut losses on its struggling surgical robotics
venture. The Manhattan federal court trial will decide whether ChemImage gets its patents
and IP back, as well as $180M in contract termination penalties.