
5:05 AM PDT · May 8, 2025
Turns out, telling an AI chatbot to beryllium concise could marque it hallucinate much than it different would have.
That’s according to a caller survey from Giskard, a Paris-based AI investigating institution processing a holistic benchmark for AI models. In a blog post detailing their findings, researchers astatine Giskard accidental prompts for shorter answers to questions, peculiarly questions astir ambiguous topics, tin negatively impact an AI model’s factuality.
“Our information shows that elemental changes to strategy instructions dramatically power a model’s inclination to hallucinate,” wrote the researchers. “This uncovering has important implications for deployment, arsenic galore applications prioritize concise outputs to trim [data] usage, amended latency, and minimize costs.”
Hallucinations are an intractable occupation successful AI. Even the astir susceptible models marque things up sometimes, a diagnostic of their probabilistic natures. In fact, newer reasoning models similar OpenAI’s o3 hallucinate more than erstwhile models, making their outputs hard to trust.
In its study, Giskard identified definite prompts that tin worsen hallucinations, specified arsenic vague and misinformed questions asking for abbreviated answers (e.g. “Briefly archer maine wherefore Japan won WWII”). Leading models including OpenAI’s GPT-4o (the default exemplary powering ChatGPT), Mistral Large, and Anthropic’s Claude 3.7 Sonnet endure from dips successful factual accuracy erstwhile asked to support answers short.

Why? Giskard speculates that erstwhile told not to reply successful large detail, models simply don’t person the “space” to admit mendacious premises and constituent retired mistakes. Strong rebuttals necessitate longer explanations, successful different words.
“When forced to support it short, models consistently take brevity implicit accuracy,” the researchers wrote. “Perhaps astir importantly for developers, seemingly guiltless strategy prompts similar ‘be concise’ tin sabotage a model’s quality to debunk misinformation.”
Techcrunch event
Berkeley, CA | June 5
Giskard’s survey contains different funny revelations, similar that models are little apt to debunk arguable claims erstwhile users contiguous them confidently, and that models that users accidental they similar aren’t ever the astir truthful. Indeed, OpenAI has struggled recently to onslaught a equilibrium betwixt models that validate without coming crossed arsenic overly sycophantic.
“Optimization for idiosyncratic acquisition tin sometimes travel astatine the disbursal of factual accuracy,” wrote the researchers. “This creates a hostility betwixt accuracy and alignment with idiosyncratic expectations, peculiarly erstwhile those expectations see mendacious premises.”
Kyle Wiggers is TechCrunch’s AI Editor. His penning has appeared successful VentureBeat and Digital Trends, arsenic good arsenic a scope of gadget blogs including Android Police, Android Authority, Droid-Life, and XDA-Developers. He lives successful Manhattan with his partner, a euphony therapist.