Meta exec denies the company artificially boosted Llama 4’s benchmark scores

Spread the love

A Meta Executive on Monday denied a rumor that the company trained its new AI models to present well to specific criteria while hiding the weaknesses of models.

Executive, Ahmed al-Dahle, Meta Generator AI VP, In a post at x says in a post It trained it to the meta “not just true” Lama 4 Mavarick and Lama 4 Scout Models At the “test set”. In AI benchmarks, test sets collect data used to evaluate effectiveness after training of a model. Training on a test set can spread the benchmark score of a model, which makes the model more capable of reality.

On the weekend, An unsupported rumor This meta artificially began to be broadcast on X and Reddit on the benchmark results of its new models. The rumor seems to have derived from a post on a Chinese social media site from a user that he claimed that the company had resigned from the meta in protest of the company’s standard practice.

That reports mavarick and scout Performance Bad Opened Work The rumor burned, as if the meta decided to use Experimental, mavarick’s unpublished version To achieve better scores in the benchmark LM ArenaThe There are researchers of X Stark observation Discrimination Comparatively downloadable Mavarick compared with the model hosted in LM Arena.

Al-Dahley admits that some user models are watching “mixed quality” from Mavarick and Scouts across various cloud suppliers hosting.

Al-Dahle said, “Since we have excluded the models as soon as we are ready, we hope that it will take several days to dial all public implementation,” said al-Dahl. “We will continue to work through our bug fix and on -boarding partners.”

Leave a Reply

Your email address will not be published. Required fields are marked *