Anthropic used Pokémon to benchmark its newest AI model

Spread the love

Anthropic used Pokémon to benchmark its new AI model. Yes, really.

On a blog Post Published on Monday, the ethnic said it had tested its latest model, CLOD 3.7 SonnetGame Boy Classic Pokémon Red. The company equipped the model to press the button with basic memory, screen pixel input and function calls and navigate the model around the screen, allowing it to play Pokémon continuously.

A unique feature of Claud 3.7 Sonnet is the ability to engage in “extended thinking”. Like Openai and 3-Minit and DIPSC R1, Claud 3.7 Sonnets can “argue” through challenging problems by applying more computing-and for more time.

This was clearly worked out on Pokémon Red.

Compared to the previous version of Cloud, Claud 4.1 Sonnet, which started the story, failed to leave the house in Palet Town, Claud 4.7 Sonnet successfully won their badge by successfully fighting three Pokémon Gym leaders.

Ethnographic Pokémon Red
Figure Credit:Ethnographic

Now, it is not clear how much computer was needed to reach those milestones in the sonnet – and how long each time it took. Anthropic cable said that the model has performed 35,000 verbs to reach the latest gym leader, incentive.

It will certainly not be too long before looking for some zealous developers.

Pokémon Red is more of a toy benchmark than anything. However, there Is A long history AI of games is being used for benchmarking purposes. In the past few months alone, several new apps and platforms cropped to test model game-play Street From PicsThe

Leave a Reply

Your email address will not be published. Required fields are marked *