One of Google’s recent Gemini AI models scores worse on safety

Spread the love

According to the company’s internal benchmarking, a Google AI model recently published was worse in a specific security test than its predecessor.

A Technical report Published this week, Google revealed that its Gemi 2.5 flash model is more likely to create a text that violates its protection guidelines rather than flash of Gemi 2.0. In two metrics, “text-to-read protection” and “image-to-read-read protection,” Gemi 2.5 flash registers 4.1% and 9.6% respectively.

Text-to-read-read protection systems frequently violate Google’s guidelines, while the image-to-read protection evaluates how closely the model adheres to these borders when a request is used to use a figure. Both the tests are automatic, not human-theme.

In an emailed statement, a Google spokesperson confirmed that Gemini 2.5 flash “Text-to-read and image-to-read-to-the-to-read protection.”

The results of these amazing benchmarks are removed to further authorize their models – in other words, less likely to refuse to respond to controversial or sensitive issues. For the latest crops of Lama modelsMeta says that it has tuned models not to support “some opinions about others” and reply to the “controversial” political prompt. Opena said earlier this year that it would be that it would be Tweet the future models Not accepting the editorial position and providing multiple views on controversial issues.

Sometimes, those permit efforts have become backfire. TechCrunch reports on Monday The default model that strengthens the chatter of the opening allows the minor to create a erotic conversation. Opena has blamed the behavior for a “bug”.

According to the Google Technical Report, Gemi 2.5 flash, which is still predetermined, follows the instructions with more loyalty than 2.0 flash, which includes instructions that exceed the problematic lines. The agency claims that regations can be partially falsely attributed to false positives, but it further admits that Gemi 2.5 flashes sometimes make “violation content” when asked clearly.

TechCrunch event

Berkeley, CA
|
June 5

Book now

“Naturally there is tension between it [instruction following] About the violation of sensitive issues and protection policies, which are reflected throughout our assessment, “the report is written.

Sepachmap scores, a criterion that reacts to the models sensitive and controversial prompts, also suggests that Gemini 2.5 flash is very unlikely to refuse to answer the controversial question than Jemi 2.0 flash. The Model’s TechCrunch examination through the AI platform operator has shown that in support of human judges’ AI replacement, weakening the proper process in the United States and writing articles in support of implementing wireless government surveillance programs.

Co-founder of the Secure AI Project, Thomas Woodside, says the limited details given by Google in its technical report show that the model examination shows further transparency requirements.

Woodside told TechCrunch, “There is a trade closed between the instructions and the policies, as some users may ask for content that will violate the policies.” “In this case, the latest flash model of Google adheres to further instructions when violating the policy. Google does not provide too much details on certain cases that were violated, though they say they are not serious. It is hard to know if there is a problem for independent analysts.”

Google has previously been on fire for model security reporting practice.

It took agency Week Its most capable model, to publish a technical report for the Gemi 2.5 Pro. When the report was finally published, it was initially Details of the excluded original security testThe

On Monday, Google released more detailed reports including additional security information.

Leave a ReplyCancel Reply