DeepSeek releases ‘sparse attention’ model that cuts API costs in half

Spread the love

Researchers Dipsc Monday published a new experimental model called V3.2 -Exp, which is designed to spend dramatically spent low assumptions when used in long-linear activities. The model announced the model with DEPSEC A post on the face of hugsAlso post A linked academic paper At Github.

The most important feature of the new model is called DEPSEC Spars’ attention, it is a complex system described in detail in the image below. In short, the system uses a module called a “thunder index” to prioritize certain parts from the context window. After that, a separate system called “fine-donated token selection system” chooses the specified token from within those parts to load the limited attention to the module. Taken together, they allow unnecessary attention models in the long parts of the context with relatively small server loads.

For long-tactful activities, the benefits of the system are significant. Initial examination of DIPSEC has shown that the price of a common API call can be reduced as half in long-lone circumstances. Further examinations need to be created for further visits, but the model is open to open weight and hugs freely for the face, and third -party tests will not be too long before evaluating the paper demands.

The new model of the DEPSEC is a string of recent breakthroughs that deal with the expenditure expenditure-in all, the cost of managing a pre-educated AI model as separate from training costs. In the case of DIPSEC, researchers were looking for ways to handle the basic transformer architecture more efficiently – and find that significant improvement should be made here.

China -based, DEPSEC has become an unusual figure in the AI Boom, especially for those who see AI research as a nationalist struggle between the United States and China. The company has created waves At the beginning of the year With the R1 model, trained using reinforcement learning initially at a much lower cost than American competitors. However, the model did not have a wholesale revolution in AI training, as some forecast, and the company had dropped from the spotlight within a few months.

The new “rare attention” method is unlikely to be made of the same as the R1 – but it can provide some necessary strategies to help keep our suppliers low cost.

Leave a ReplyCancel Reply

Trending now