AI benchmarking organization criticized for waiting to disclose funding from OpenAI

Spread the love

An organization developing mathematical standards for AI did not disclose that it received funding from OpenAI until relatively recently, prompting accusations of impropriety from some in the AI community.

Epoch AI, a nonprofit organization primarily funded by Open Philanthropy, a research and grant-making foundation, disclosed on December 20 that OpenAI supported the creation of FrontierMath. FrontierMath, a test on expert-level problems designed to measure an AI’s mathematical prowess, was one of the benchmarks used to demo OpenAI’s upcoming flagship AI, o3.

A post On the forum LessWrong, Epoch AI’s contractor username “Meemi” says that many contributors to the FrontierMath benchmark were not informed of OpenAI’s involvement until it was revealed.

“Communication about this has been opaque,” Mimi wrote. “In my view Epoch AI should have disclosed its OpenAI funding, and contractors should have transparent information about the feasibility of their work for the capability when choosing whether to work on the benchmark.”

On social media, something users Concerns have been raised that privacy could tarnish FrontierMath’s reputation as an objective benchmark. In addition to supporting FrontierMath, OpenAI had access to many benchmark issues and solutions — a fact Epoch AI hadn’t released before December 20, when o3 was announced.

In response to Mimi’s post, Tamay Besiroglu, associate director of Epoch AI and one of the company’s co-founders, insisted that FrontierMath’s integrity had not been compromised, but acknowledged that Epoch AI “made a mistake” by not being more transparent.

“We were restricted from disclosing partnerships until the launch of o3, and we should have negotiated harder for the ability to be transparent to benchmark contributors as soon as possible,” Besiroglu wrote. “Our mathematicians deserve to know they can have access to their work. While we were contractually limited in what we could say, our transparency with our contributors should have been a non-negotiable part of our contract with OpenAI.”

Besiroglu added that despite OpenAI’s access to FrontierMath, Epoch AI has a “verbal agreement” not to use the FrontierMath problem to train its AI. (An AI training in FrontierMath would be similar Teaching to the test.) Epoch AI has a “separate set of holdouts” that serve as an additional safeguard for independent verification of FrontierMath benchmark results, Besiroglu said.

“OpenAI … fully supported our decision to maintain a separate, unseen holdout set,” Besiroglu wrote.

However, to muddy the waters, Epoch AI chief mathematician Ellot Glazer As mentioned in a post on Reddit That Epoch AI has not been able to independently verify OpenAI’s FrontierMath o3 results.

“This is my personal opinion [OpenAI’s] The score is valid (ie, they didn’t train on the dataset), and they have no incentive to lie about internal benchmarking performance,” Glazer said. “However, we cannot speak for them until our independent assessment is complete.”

The story is yet other example The challenge of developing empirical benchmarks for evaluating AI – and securing the resources needed to develop benchmarks without creating perceptions of conflict of interest.

Leave a ReplyCancel Reply

Trending now