Following is a slightly edited copy of a note with that title.   As I have been saying, hybrids that use symbolic methods are essential for evaluating and correcting the results generated by LLMs. 

One way to limit the amount of errors and hallucinations is to limit the amount of text that is being processed by the LLMs.  But that contradicts people like Elon M. who has been using a huge amount of Nvidia chips, time, electricity, cooling, and $$$.

Elon is processing larger and larger amounts of input data, but he is also getting larger and larger amounts of irrelevant data as well as garbage and propaganda generated by people who are dumping the worst kind of stuff into the WWW.

Evaluation by symbolic methods is essential.   There are people in Kenya who are being traumatized by companies that hire them to find and flag the garbage.  They need the money, but it would be better to use symbolic AI methods that can detect the garbage without turning poor people into lunatics.  The Kenyan people have nightmares, and some have committed suicide after spending 8 hours a day, every day reading that stuff.

And the fact that they need to hire poor people shows that LLMs, by themselves, cannot do the evaluation.  That is a task that our Permion Inc. company does by using symbolic AI methods for evaluation.  For examples, see the methods that our purely symbolic VivoMind system did from 2000 to 2010:   https://jfsowa.com/talks/cogmem.pdf 

You can skip to the final section for three examples of applications by the old VivoMind methods.   Those are applications that customers specified and paid for.  None of them can be done today by LLMs.  But the new Permion hybrid methods can do all three kinds:  symbolic, neural, and neurosymbolic.

The Permion and VivoMind methods can also do multiple languages, including language switching in the middle of a sentence.  That is important for processing technical writing that mix human languages with texts from all kinds of notations for science, engineering, and all kinds of products and system.  

One of the VivoMind applications dealt with texts about organic chemistry, which mixes symbols for millions of molecules with English text.  Another application mixes computer jargon and programming langages with English text.  And the system has to relate information in multiple computer languages with software specifications and application data.  All that was done from 2000 to 2010. 

John
 


From: "Anand Sanwal" <anand.sanwal@cbinsights.com>

AI agent problems







March 18, 2025

Talk is cheap

Hi there. 

As AI agents dominate the conversation, customers are growing skeptical about whether they can live up to the hype. 

In March, we’ve interviewed 40+ customers of AI agent products and are hearing of 3 primary pain points right now:

  1. Reliability
  2. Integration headaches
  3. Lack of differentiation

1. Reliability 

This is the #1 concern raised by organizations adopting AI agents, with nearly half of respondents citing reliability & security as a key issue in a survey we conducted in December.  

According to CBI’s latest buyer interviews, AI agent reliability varies dramatically across providers. Many customers report a gap between marketing and reality. 

"Whatever was promised didn't work as great as said," one LangChain user told us about the company’s APIs. "We encountered cases where we were getting partially processed information, and the data we were trying to scrape was not exactly clean or was hallucinating." 

For many customers, reliability is largely a function of how complex the data and use cases are. For instance, the LangChain customer saw ~80% accuracy for simpler tasks, but “for complex tasks, the accuracy dropped to around 50%.”  

Organizations are tackling the reliability issue with 1) human oversight; and 2) more extensive model training. 

An Ema customer, for instance, first has a subject-matter expert review outputs, and once “more than 90% of the responses that we have tested are now accurate, we let it fly.” 

A customer for CrewAI, which orchestrates teams of AI agents into “crews,” takes an even more involved approach:


The customer still needs to intervene with their own ML algorithms when CrewAI is unable to handle outliers or unconventional data structures. If CrewAI is able to tackle these cases in the future, “that would be a huge leap forward.” 



2. Integration headaches 

Integration limitations rank as another top customer pain point. 

For one, lack of interoperability poses long-term challenges, as this Cognigy customer notes:


An Artisan AI customer echoes this: “It was a bit of a gamble that we were signing up for a product where they didn't have quite all the integrations that we wanted.” 

Where customers see real value from these tools is when they can support seamless data flow, especially through customers' existing tech stacks. This buyer went with Decagon because of its integrations:


3. Lack of differentiation 

More than half of private capital flowing into the AI agent space has gone to horizontal applications — but these markets, like customer support and coding, are becoming highly saturated. 

"There's so many short-term moats, but in the long term there is no moat," one customer observed. "Whatever you build will be rapidly reproduced."


In a crowded market, specialization will determine success. 

Hebbia, for instance, has tailored its solution to financial players. An exec at a PE firm framed this as a selling point when getting internal buy-in: “When I bring tools to the deal team that live and breathe diligence and deal execution, ensuring that it's aligned to what they know and understand and [that it] speaks their language is incredibly important.” 

While many horizontal AI agents are actively deploying or even scaling their solutions, vertical AI agents remain nascent, with half still in the first 2 levels of Commercial Maturity.  

They’ll gain more momentum this year as enterprises prioritize solutions that are highly tailored to the needs of individual industries.



TLDR — Tech loves drama, right?

Here's a roundup of recent tech drama:

  • Let’s make a Deel: HR and payroll platform Rippling is suing its rival Deel for allegedly planting a spy to access Rippling’s internal sales pipeline data and customer interactions. Rippling CEO Parker Conrad confirmed they set up a “honeypot” to prove that Deel’s senior leadership was orchestrating the illegal activity — and the double agent fell for it. When served with a court order, the spy locked himself in a bathroom and allegedly tried to flush his phone down the toilet. Rippling is seeking damages.
     
 


  • Huawei in hot water: Belgian police arrested multiple individuals in a corruption probe involving forgery and falsified documents linked to Chinese tech giant Huawei. Authorities believe lobbyists paid off European Parliament members with cash, expensive gifts, and luxury trips to promote Huawei’s business interests in the region. Huawei denies wrongdoing but insists it is taking the allegations “seriously."
  • Dirty laundry: Indian authorities arrested Lithuanian national Aleksej Besciokov, co-founder of Russian crypto exchange Garantex, at the request of the US. Besciokov is accused of facilitating money laundering linked to North Korean hackers and other cybercriminals. The arrest follows the US government’s seizure of Garantex’s website and $26M in frozen assets. Garantex, now under fire, claims it has plans to compensate customers for blocked funds.
     
  • Lilac claps back: Former employees of lithium tech startup Lilac Solutions sued the company, claiming exposure to toxic chemicals left them with severe health issues. The Breakthrough Energy Ventures-backed firm fired back with its own lawsuit, accusing the workers of leaking trade secrets. OSHA has already hit Lilac with multiple citations, and a legal battle is now brewing over safety, whistleblower retaliation, and IP theft. Oh my.
     
  • Apple vs. UK: Apple is fighting a UK order demanding that it build a “back door” into its security systems. Privacy activists are suing the government, calling the demand a major privacy violation. Apple already pulled its iCloud Advanced Data Protection from the UK after getting a secret government order. Now, US lawmakers are jumping in, pushing the UK to be more transparent about its legal process.
     
  • Scalpel scandal: AI imaging firm ChemImage is suing Johnson & Johnson, claiming the healthcare giant stole its tech after a multibillion-dollar partnership went south. J&J argues the contract was scrapped because ChemImage failed to meet key milestones, while ChemImage says J&J bailed to cut losses on its struggling surgical robotics venture. The Manhattan federal court trial will decide whether ChemImage gets its patents and IP back, as well as $180M in contract termination penalties.