
The solution is to create an environment - a simulation space for AI agents to practice multi-step tasks, called enhanced learning (RL). Similar to the way labeling data once promoted the chatbot era, the RL environment is becoming an important factor for the new generation of AI.
adventure funds, startups, and AI laboratories are all in this race. Andreessen Horowitz commented that all major laboratories build internal RL environments, while looking for external partners.
Many new companies such as Mechanize, Prime Intellect have called for large investments to develop an environmental platform, while big names that label data such as Scale AI, Surge, Mercor have also changed their investment direction to avoid being left behind.
Some deals show the heat of the trend: Anthropic is said to consider spending more than 1 billion USD on the RL environment; Surge achieved revenue of 1.2 billion USD last year thanks to cooperation with OpenAI, Google, Meta; Mercor - valued at 10 billion USD.
The nature of the RL environment is to simulate how AI operates software, for example, an actor is required to buy on Amazon and is rated based on the results. The work seems simple but requires an environment that is sophisticated enough to record unexpected acts. This makes RL much more complex and expensive than static data.
While the opportunity to expand RL is still controversial, Silicon Valley still considers this one of the important directions to promote AI advances, with the expectation of recreating the wave of label data that created ChatGPT.