Chinese AI startup DeepSeek has just announced DeepSeek-OCR, a new multi-modal AI model capable of processing a huge amount of documents at a significantly lower calculation cost.
This model can generate up to 200,000 pages of trained data per day with just one Nvidia A100 GPU, demonstrating the step forward in performance and resource optimization in AI research.
According to DeepSeek, DeepSeek-OCR takes advantage of visual perception to compress text, helping large language models (LLM) process context for longer without memory restrictions.
Instead of reading text in the usual way, the model converts text into images, then uses a visual encryption set to compile data while still keeping up to 97% of the original information.
As a result, the number oftokens that need to be processed is reduced by 7 to 20 times compared to the traditional method.
The model consists of two parts: DeepEncoder with 380 million parameters for analysis and image compression, along with a document creator with 570 million parameters, built on a three-performance MoE language model.
According to technical documents, DeepSeek-OCR has been trained with more than 30 million PDF pages in more than 100 languages, including Chinese and English, along with millions of complex chemical and pharmaceutical diagrams and formulas.
Test results show that DeepSeek-OCR is superior to existing OCR models. On OmniDocBench standards, the model only requires about 100 visual notification codes per page, significantly lower than GOT-OCR2.0 (256 token) and MinerU2.0 (more than 6,000 token/page).
On Fox standards, DeepSeek-OCR also shows the ability to concentrate and analyze dense PDF documents.
With DeepSeek-OCR, the company aims to solve one of LLM's biggest challenges: maintaining the ability to understand long-term context without consuming resources.
The announcement of the source code and model weight on open platforms such as hugging Face and GitHub also demonstrates DeepSeek's commitment to promoting transparency and cooperation in the global AI community.
This is not the first time DeepSeek has attracted attention. Previously, DeepSeek-V3 and R1 models were as performing as advanced systems like OpenAI's o1, but at a fraction of the cost.
However, some experts in the US still question the company's low-cost claim and development process.
Despite the controversy, DeepSeek-OCR marks an important step forward in efforts to reduce costs and increase efficiency for the AI industry, opening a new direction in combining computer vision and natural language processing.