Salesforce and Gartner reveal the shortcomings of AI agents in successfully handling complex tasks.
In recent years, the promise of "agentic AI" has captivated the minds of tech enthusiasts and businesses alike. The idea of autonomous systems effortlessly completing complex tasks and revolutionizing office workflows seems irresistible—like something out of science fiction. Think Star Trek's voice-command tea dispenser. Unfortunately, as the latest findings show, this vision remains more fictional than reality.
A report published by The Register highlights the ongoing struggles with agentic AI systems. Research by Carnegie Mellon University (CMU) paints a sobering picture: AI agents fail to complete multi-step tasks effectively nearly 70% of the time! Gartner, a leading consultancy in technology trends, expects over 40% of AI agent projects to be scrapped by 2027 due to issues like unclear business value and ballooning costs (Source: Gartner Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027). Combine this with Salesforce’s recent CRM-specific benchmarks, which revealed modest success rates (only 35% for complex multi-turn interactions) (Source: CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions) , and the conclusion becomes hard to ignore: AI agents, as they currently stand, are far from replacing us in the workplace.
Even Gartner is unimpressed, even accusing companies of "agent washing"—rebranding older technologies like chatbots and robotic process automation (RPA) as revolutionary AI agents, despite their inability to function autonomously and effectively.
In testing environments like CMU’s TheAgentCompany, AI models—including cutting-edge ones like Gemini-2.5-Pro and Claude-3.7-Sonnet—performed woefully. Success rates ranged from as low as 1% to a high of just 30%, even for basic office tasks like scheduling, coding, or responding to coworker communications. Researchers noted glaring flaws, including scenarios where AI agents outright fabricated solutions or struggled with basic user interface elements like pop-ups.
Adding to the skepticism, Apple recently weighed in on this topic through their own studies (Apple's Study Reveals Insights: Debunking AI Superintelligence Myths). Apple’s stance is that claims surrounding the reasoning capabilities of AI agents are often exaggerated. Autonomous decision-making, reasoning, and context-awareness—capabilities that would propel agentic AI closer to the human-like intelligence depicted in Hollywood—are riddled with fundamental flaws. These systems struggle to understand nuances and lack the maturity to perform tasks securely and reliably in corporate environments.
Apple's approach to AI is fundamentally cautious, highlighting a significant gap between expectations in AI capabilities and current functional reality. This explains why most AI advertisements from Google (Gemini Live Advertising) and other tech giants predominantly showcase simpler, less sophisticated applications. Their promotional efforts tend to emphasize basic AI tasks, subtly acknowledging that the technology has yet to fully mature to handle more complex, subtler challenges efficiently.
Salesforce researchers echoed the same thoughts in their CRM-focused benchmarks. Their findings showed agent success rates averaging around 58% for single-turn tasks and dropping to 35% in more complex multi-turn scenarios (Source). What’s alarming is the near-zero “confidentiality awareness” displayed by the AI models. For businesses that rely on sensitive data, this becomes an enormous red flag.
Meanwhile, Gartner analysts have concluded that most agentic AI offerings provide little return on investment (ROI). Current AI models lack the maturity to autonomously achieve complex business goals or follow nuanced instructions reliably. The hype is there, but the substance is sorely lacking (Source).
To summarize: The worldview of AI agents swooping in to simplify business operations is, at best, premature. Despite promises by major vendors, and sustained hype surrounding these technologies, the numbers don’t lie—AI agents struggle to deliver competent, secure, and scalable solutions.
Where does this leave businesses?
The allure of delegating work to intelligent agents remains a compelling vision, but for now, it’s exactly that—a vision. Salesforce, Gartner, Apple, and CMU are all tapping brakes on the hype surrounding agentic AI. Businesses must temper their expectations, focus on pragmatic solutions, and embrace the fact that human intelligence is still the most reliable decision-making tool available to organizations.