News
Google has released Magenta RealTime (Magenta RT), an open-source AI model for live music creation and control. The model responds to text prompts, audio samples, or both. Magenta RT is built on an ...
The Othello world model hypothesis suggests that language models trained only on move sequences can form an internal model of the game - including the board layout and game mechanics - without ever ...
Elon Musk has announced plans to retrain Grok with statements he calls "politically incorrect, but nonetheless factually true," claiming this will correct and expand all human knowledge. Previously, ...
This wasn't a one-off. In a text-only version of the same test, Claude Opus 4 chose blackmail 96 percent of the time. Google's Gemini 2.5 Flash nearly matched that rate. OpenAI's GPT-4.1 and xAI's ...
OpenAI has significantly updated ChatGPT's search feature: it now handles longer contexts, better follows instructions, answers complex questions with several parallel searches, and allows users to ...
Sakana AI's ALE-Agent achieved 21st place out of 1,000 human participants in a live AtCoder Heuristic Competition, with the agent based on Google's Gemini 2.5 Pro. The AI solved complex optimization ...
Apple executives have been talking internally about potentially buying AI startup Perplexity AI, according to a Bloomberg report. The idea is to grab both the technology and talent for Apple's own ...
Snake training outperforms math datasets in some areas Training on Snake and rotation problems nudged the base model slightly ahead of MM-Eureka-Qwen-7B, a model specifically trained on math data, ...
Video: OpenAI Setting new benchmark records OpenAI o3, first introduced in December 2024 and refined since then, is reportedly the company's most powerful reasoning model. OpenAI says it demonstrates ...
To reach the performance of FineWeb-Edu, other datasets like C4 or Dolma need up to 10 times more training data. This again shows the effectiveness of focusing on high quality educational data, ...
OpenAI has introduced a new family of language models—GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano—exclusively for use via its API. According to the company, these models are targeted at professional ...
To address this challenge, Deepmind is exploring methods that allow AI systems to evaluate their own outputs. One approach is AI debate, in which models provide feedback on each other’s answers, ...
Results that may be inaccessible to you are currently showing.
Hide inaccessible results