OpenAI pledges to publish AI safety test results more often

2 hours ago 20
Open AI Chief Executive Officer Sam Altman speaks during the Kakao media time  successful  Seoul.Image Credits:Kim Jae-Hwan/SOPA Images/LightRocket / Getty Images

9:38 AM PDT · May 14, 2025

OpenAI is moving to people the results of its interior AI exemplary information evaluations much regularly successful what the outfit is pitching arsenic an effort to summation transparency.

On Wednesday, OpenAI launched the Safety Evaluations Hub, a webpage showing however the company’s models people connected assorted tests for harmful contented generation, jailbreaks, and hallucinations. OpenAI says that it’ll usage the hub to stock metrics connected an “ongoing basis,” and that it intends to update the hub with “major exemplary updates” going forward.

Introducing the Safety Evaluations Hub—a assets to research information results for our models.

While strategy cards stock information metrics astatine launch, the Hub volition beryllium updated periodically arsenic portion of our efforts to pass proactively astir safety.https://t.co/c8NgmXlC2Y

— OpenAI (@OpenAI) May 14, 2025

“As the subject of AI valuation evolves, we purpose to stock our advancement connected processing much scalable ways to measurement exemplary capableness and safety,” wrote OpenAI successful a blog post. “By sharing a subset of our information valuation results here, we anticipation this volition not lone marque it easier to recognize the information show of OpenAI systems implicit time, but besides enactment assemblage efforts⁠ to summation transparency crossed the field.”

OpenAI says that it whitethorn adhd further evaluations to the hub implicit time.

In caller months, OpenAI has raised the ire of immoderate ethicists for reportedly rushing the information investigating of definite flagship models and failing to merchandise method reports for others. The company’s CEO, Sam Altman, also stands accused of misleading OpenAI executives astir exemplary information reviews anterior to his brief ouster in November 2023.

Late past month, OpenAI was forced to rotation backmost an update to the default exemplary powering ChatGPT, GPT-4o, aft users began reporting that it responded successful an overly validating and agreeable way. X became flooded with screenshots of ChatGPT applauding each sorts of problematic, dangerous decisions and ideas.

OpenAI said that it would implement respective fixes and changes to forestall aboriginal specified incidents, including introducing an opt-in “alpha phase” for immoderate models that would let definite ChatGPT users to trial the models and springiness feedback earlier launch.

Kyle Wiggers is TechCrunch’s AI Editor. His penning has appeared successful VentureBeat and Digital Trends, arsenic good arsenic a scope of gadget blogs including Android Police, Android Authority, Droid-Life, and XDA-Developers. He lives successful Manhattan with his partner, a euphony therapist.

Read Entire Article