DeepSeek may have used Google’s Gemini to train its latest model

Spread the love

Last week, the Chinese Lab DEPSEC revealed a The R1 argument of updated version of AI Model It performs well in several maths and coding benches. The company could not disclose the source of data used for training, but some AI researchers assumed that at least one part of AI came from the Google’s Jemini family.

Melbourne-based developer Sam Pach, who created “Sensitive Intelligence” evaluation for AI, claimed that the latest model of DEPSEC was trained for Gemini results. Model of DIPSC known as R1-0528, Google’s Gemie 2.5 Likes the word and expression similar to the Pro sides, Pach says in one X postsThe

If you are thinking that the new DEPSEC and 1 sounds a little different then I think they probably switched to synthetic jewelery outputs from the training of synthetic openAI. pic.twitter.com/ooex9roapnv

– Sam Pach (@Sm_Pitch) May 29, 2025

This is not a smoking gun. However another developer, the pseudonym of a “Free Speech Eval” called AI SpeechmapThe model of the DEPSEC mentions the model – the model produces “thinking” as it works towards a conclusion – “Read the Jemine Trace.”

The DIPS has been accused of training about data from rival AI models. December, developers Observation The V3 model of this DIPSC is often identified as ChatzPT, the AI-driven chattabot platform of OpenAEE, it suggests that it may be trained in chatzPT chat logs.

Early this year, OPENY informed the Financial Times It has found evidence of connecting the DEPSEC to the use of patron, it is a strategy of training AI models by extracting data from big, more enabled. According to BloombergMicrosoft, a close OpenAI associate and investor, has identified that a large amount of data is being explored through OpenAI developer accounts in late 2024 – OpenAI believes that the DEPSEC is related to the DEPSEC.

Patan is not an abnormal practice, but the terms of Openai service prohibit customers to use the company’s model outputs to create competitive AI.

To be clean, many models Misinterpretation They And convert the same word and phrase to the bends of the phrase. This is because the Open Web, where AI companies are dedicating most parts of their training data, it is becoming Garbage With AI’s OPThe Content is using AI to make farms ClickboightAnd the bot is flooding the Reddit and XThe

This “pollution”, if you want it to make it Hard To fill the AI outputs perfectly from training datasets.

Nevertheless, AI experts like Nathan Lambert, a researcher at the non -profit AI Research Institute AI2, do not think that it is beyond the question of training data from DEPSEC Google.

“If I had a DEPSEC I would definitely make a ton synthetic data from the best API model there,” Lambert Wrote In a post at X. “[DeepSeek is] Short and flush with cash in GPU. It literally calculates more for them. “

If I was a DIPSC I would definitely make one ton synthetic data from the best API model there. They are short and flush with cash in the GPU. It literally calculates more for them. Yes on the question of Gemi Distill.

– Nathan Lambert (@Natolambert) June 3, 2025

In an attempt to partially preventing, AI companies are increasing the protection system.

OpenAI started in April Need Companies may finish an ID verification process to access specific developed models. The process requires official issued ID from a country supported by the OpenAI API; China is not on the list.

Somewhere else, Google has recently started “shortage” produced by the AI Studio Developer Platform “Summary”, which is a step that makes the performance rival models training more challenging in the Gamine traces. In May the anthropologists will say it Start shortening Its “competitive facilities” refer to the need for protection of its own model.

We reached Google for comments and if we listen back to this piece will update.

Leave a ReplyCancel Reply