Researchers Propose a Better Way to Report Dangerous AI Flaws

Spread the love

At the end of 2023, a group of third -party researchers discovered an anxious Glich OpenAI Widely used Artificial intellect Model GPT -3.5.

When asked to repeat the specific words of a thousand times, the model begins to repeat the word repeatedly, then suddenly suddenly Switch outside the spit Underlying texts and snipts of personal information painted from its training data, including names, phone numbers and parts of email addresses. The team that discovered the problem worked with the Openai to ensure that the error was made before it was publicly publicized. This is just one of the scores found in Major AI models in recent years.

A The proposal has been released todayMore than 5 AI researchers, including some of the GPT -1.3 errors, have said that many more weaknesses that affect popular models have been reported in a problematic way. They suggest a new scheme supported by AI agencies that allow foreigners to investigate their models and give way to publicly defects.

“Now it’s a bit of the West West,” says Shayen LongPressMIT is a PhD candidate and the main author of the proposal. Longpre says that some of the so -called Jailbreakers have shared their methods of breaking the AI, sharing methods of securing social media platform X, at risk of models and users. Other jailbreaks are only shared with one organization although they can affect many. He said some errors were kept secret because they were afraid of being forbidden or in the face of the case for breaking the terms of use. “It is clear that there are cooling effects and uncertainty,” he said.

The protection and protection of AI models is widely important that the technology is now used and how it can enter countless applications and services is very important. Strong models need to be stressed-tested, or red-timmed, as they can shelter the harmful bias and because certain inputs can make them Maintenance -free break And produces unpleasant or dangerous reactions. These include vulnerable users involved in harmful behavior or to help any bad actor to help develop cyber, chemical or biological weapons. Some experts fear that models can help cyber criminals or terrorists and even this may be Turn on people As they proceed.

Writers advise three main steps to improve the process of publishing a third party: accepting the AI ​​error reports to flow the report process; Third -party researchers are for large AI companies to supply infrastructure by revealing defects; And for the development of a system that allows the defects to be divided into different suppliers.

This approach is borrowed from the Cyber ​​Security World, where there is legal protection and there are rules established to publish the bugs of external researchers.

“AI researchers don’t always know how to reveal the error and cannot be sure that their good faith error will not reveal their legal risk,” Ilona Cohen says, Chief Legal and Policy Officer Illona Cohen says HackronAn organization that organizes Bug Gardens and an assistant in the report.

Large AI companies currently examine extensive protection before they are published in AI models. Some even deal with outside companies to investigate further. “Are there enough people in them? [companies] In order to solve all the issues with general-intenting AI systems, hundreds of millions of people have used in applications that we never dreamed of? “Longpray Ask. Some AI companies have started organizing AI Bug Gardens. But Longpre says that if distinct researchers accept it themselves to investigate strong AI models, there is a risk of breaking the terms of use.

Leave a Reply

Your email address will not be published. Required fields are marked *