DeepSeek launches new model to help reduce API costs by half

Cát Tiên (THEO techcrunch) | 30/09/2025 08:48

DeepSeek's Sparse Attention technology optimizes transformers, helping to reduce server loads while still maintaining processing efficiency, opening a new direction for low-cost AI.

On September 29 (US time), the research team at DeepSeek announced a new testing model called V3.2- exp, designed to significantly reduce the cost of reasoning in long-term tasks. This model is introduced on the hugging Face platform, along with an academic article shared publicly on GitHub.

The highlight of the V3.2- exp is the DeepSeek Sparse Attention mechanism, (a sparse attention system). Instead of processing all data, the system uses a modular called a light index setup to prioritize extracting important text sections in the context window.

Then, another modular, detailed notification code selection system, will select important widgets to put into the window to limit limit notes. This approach helps reduce the server load while still maintaining the ability to handle long context segments.

According to initial testing, DeepSeek said that the cost of performing an API call can be reduced by up to 50% in situations that require large context.

Although more independent reviews are needed for authentication, the disclosure of the model on hugging Face will soon pave the way for third parties to test and verify.

The launch of V3.2- exp expands following a series of efforts to solve the problem of inference costs, which is one of the biggest challenges in operating AI models.

Unlike the initial training cost, the cost of reasoning is directly related to the server infrastructure to serve users, and is always a big burden for businesses implementing AI.

DeepSeek, a China-based company, attracted attention at the beginning of the year with the R1 model, which is trained primarily through low-cost enhancements. However, R1 has not created the revolution as expected and the interest in DeepSeek has decreased in recent months.

However, with this new mechanism, DeepSeek is showing a new approach to optimize transformer architecture. This solution may not be as noisy as the R1, but it is considered to bring practical lessons, especially for AI service providers in the US, in the context of the need to cuteringeringeringeringering forering costs becoming increasingly urgent.

Cát Tiên (THEO techcrunch)