Silicon Valley bets big on ‘environments’ to train AI agents

Spread the love

Over the years, the Big Tech CEO has also visited AI agents It can autonomously use software applications to complete jobs for people. However, take out today’s consumer AI agents for spin, no matter what OpenAI is Chatzipt agent Or confusion CometAnd you will quickly understand how limited the technology is still. AI agents can take a new set of sights that are still discovering the industry.

One of these techniques is to carefully mimic workplaces where agents can be trained in multi-step tasks — it is known as reinforcement learning (RL) environment. Similarly, how the labeled datasets drive the end of the AI, the RL environment begins to look like an important element in the development of agents.

AI researchers, founders and investors have told TechCrunch that the top AI labs are now demanding more RL environment and there is no shortage of startups in hopes of supplying them.

In an interview with Techcunch, Andresen Horovits general partner Jennifers Lee said, “All big AI labs are creating RL environment at home.

The pressure for the RL environment has created a new class of strengthening startups such as mechanism and Prime Interest, whose goal is to lead the space. Meanwhile, large data-laceling companies like markers and surges say they are investing more in the RL environment in connection with the industry shifts in interactive simulations from static datasets. Main labs are also considering heavy investment: According to data, ethnographic leaders have discussed more expenses than that 1 billion dollars in the RL environment Over the next year.

Hope for investors and founders is one of these startups as “scale AI for the environment”, noted that it $ 29 Billion Data Labeling Power House It driven the chatboat era.

The question is whether the RL environment will really push the border of AI progress.

TechCrunch event

San Francisco
|
October 27-29, 2025

What is the RL environment?

In their main part, the RL environment is training fields that an AI agent imitates what to do in an real software application. A founder describes their building Recent interview “Like creating very disturbing video games” “

For example, an environment can mimic a chrome browser and task an AI agent with a pair of socks on Amazon. The agent is graded in his performance and a prize signal is sent when it is successful (in this case, buy a suitable pair of socks).

Although this national work sounds relatively easy, there are many places where an AI agent can become a trip. It can be lost in the web -page drop -down menu to navigate or buy too much socks. And since the developers cannot predict exactly what an agent will take the wrong fold, the environment itself has to be sufficient to capture an unexpected behavior and still have to provide useful reactions. It makes the building environment more complicated than the static datasate.

Some environment is quite wide, allowing AI agents to use the tools, access the Internet, or use different software applications to complete the task given. Others are more narrow, to help an agent learn specific tasks in enterprise software applications.

Although the RL environment is hot in the Silicon Valley right now, there is a lot of instant to use this technique. Was one of the first projects in Openai in 20 2016 ”RL gym“Which was quite similar to the modern concept of the environment. The same year, Google Dipmind Alfago AI defeat a world champion in the AI system board, Go. It also uses RL techniques in the simulated environment.

What is unique about today’s environment is researchers trying to create computer-useful AI agents with large transformer models. Unlike Alfago, which was a special AI system working in a closed environment, today’s AI agents are trained to have more general power. AI researchers today have a stronger starting point, but there is also a complex goal where more mistakes may be.

A crowded field

AI data labeling companies like scale AI, Surge and Markor are trying to fill the moment and create an RL environment. These companies have more resources than many startups in space, as well as deep relations with AI labs.

Serge’s CEO Edwin told Chen TechCrunch that he recently saw “significant growth” according to the needs of the RL environment in AI labs. Encouragement – which has been generated Revenue is $ 1.2 Billion He said that last year, a new internal organization has been given the responsibility of creating a special RL environment since working with AI labs like OpenAI, Google, Ethnic and Meta.

Close’s surge is Marco, a $ 10 billion startup, which has worked with Open, meta and anthropologists. Markor is pitching in investors in his business The building RL environment Coding, healthcare and laws for the specific domain for domain according to marketing materials seen by TechCrunch.

Marco CEO Brendon Fudy told TechCrunch in an interview that “very few people understand how big the opportunity is around the RL environment.”

Scale AI Data Labeling Spaces dominated, but lost the ground from the meta Investment is $ 14 billion And appointed its CEO. Google and Opena since then Dropped Scale AI as a data supplier, and startup even faces competition for data labeling work Inside the metaThe However, the scale is trying to fill the moment and create an environment.

“It’s just the nature of the business [Scale AI] In, “Chetan Rain says the scale AI’s product of scale for agents and RL environment.” The scale has proved the ability to adapt quickly. We did it on the first days of our first business unit autonomous vehicles. When the chatzept comes out, the scale fits with AI. And now, again, we are adapting to new border places like agents and the environment. “

Some new players have been exclusively focusing on the environment from the beginning. Of these, a startup established with the sad goal of “all job automation” about six months ago. However, co-founder Matthew Burnett told TechCrunch that his firm was starting with the RL environment for AI coding agents.

The goal of the mechanism is to provide AI labs a small number of powerful RL environment, Burnett says, rather than greater data ferms, a wide range of ordinary RL environment. At the moment, startup is offering software engineers $ 500,000 pay To create an RL environment – one hour can work on AI or enthusiasm much higher than the contractor.

Mecanize is already working on anthropologists in the RL environment, two sources familiar with the subject have told TechCrunch. Have refused to comment on mechanization and ethnographic partnership.

Other startups are betting that the RL environment will be dominant outside the AI labs. Prime Intelligent – AI researcher Andrez Carpathy, Founding Fund, and a startup supported by Menlo Venture – is targeting small developers with the RL environment.

Last month, prime Interest has launched a RL Environment Hub, Whose aim is to “hug the face for the RL environment”. The concept is to give access to open-source developers at the same resources of large AI labs and selling access to the resources of those developers to the process.

According to Prime Intellect Researcher Will Brown, training of generally capable agents in the RL environment can be more calculated than the previous AI training strategies. Along with the RL environment -making startups, there is one more opportunity for the GPU suppliers that can strengthen the process.

Brown said in an interview, “The RL environment is going to be very big to dominate any organization.” “A portion of what we are doing is just trying to create a good open-source infrastructure around it. The service we sell is calculated, so it’s a convenient homage to use the GPU, but we are thinking more in the long run.”

Will it scale it?

The open question around the RL environment is whether the strategy will scale as the previous AI training method.

Reinforce learning has led to a number of AI’s largest jumps in the past one year, including models like this OpenAI’s O 1 Ethnographic Off work 4The These are especially important progress because the procedures used previously to improve AI models are now Showing reduced returnThe

Environment is part of the Greater AI Labs in the RL, which many believe that they will continue to progress as they add more data and calculation resources to the process. Some OpenAI researchers behind 1 1 told TechCrunch that the company originally invested in AI rational models — which was created by investing in RL and examination-time They thought it would scale it Nicely.

The best way to scale the RL remains unclear, but the environment seems to be like a committed contestant. Instead of rewarding only the chatbot for text reactions, they allow agents to handle their equipment and computers in simulations. It is much more resource-intensive, but possible more rewarding.

Some are suspicious that all these RL environment will come out. Former AI Research Leader of Meta, who co-founded General Rights, Ross Taylor told TechCrunch that RL is at risk of hacking awards. This is a process where the AI models cheat on getting rewards without actually doing the job.

Taylor said, “I think people are underestimating how difficult it is to scale the environment.” “Even universally found [RL environments] Usually do not work without serious changes. “

Sherwin Wu, the chief of engineering for OpenAI’s API business Podcast That he was “short” at the start of the RL environment. UO mentions that this is a very competitive place, but AI research is being developed so quickly that AI labs are hard to serve well.

The investor in the Prime Intellect, who called the RL Environment a potential progress, has been careful for the RL space more widely. A X postsHe expressed concern over how much AI progress he could stop from the RL.

“I bullish at the environment and agent interaction but I am particularly bullied on learning reinforcement,” said Carpathi.

Update: The previous version of this article referred to mechanization as a mechanical task. It has been updated to reflect the official name of the organization.

What is the RL environment?

A crowded field

Will it scale it?

Leave a ReplyCancel Reply

Trending now