Tensormesh raises $4.5M to squeeze more inference out of AI server loads

Spread the love

With the AI ​​infrastructure push reaching staggering proportions, there’s more pressure than ever to squeeze as much guesswork as possible out of the GPUs they have. And for researchers with expertise in a particular technique, this is a great time to raise funding.

Part of the driving force behind that TensormeshLaunching from Stealth this week with $4.5 million in seed funding. The investment was led by Laude Ventures, with additional angel funding from Database pioneer Michael Franklin.

TensorMesh is using the money to build a commercial version of open source LMCache The utility is launched and maintained by Yihua Cheng, co-founder of TensorMesh. Used well, LMCache can reduce estimated costs tenfold — a strength that has made it a staple of open-source deployments and drawn from integration heavy-hitters. Google And Nvidia. Now, Tensormesh plans to turn that academic reputation into a viable business

At the heart of the key-value cache (or KV cache), is a memory system used to process complex inputs more efficiently by condensing them into their key values. In Traditional architectureThe KV cache is discarded at the end of each query — but TensorMesh CEO Juchen Jiang argues that this is a huge source of inefficiency.

“It’s like a very smart analyst reading all the data, but after every query they forget what they learned,” said Junchen Jiang, co-founder of TensorMesh.

Instead of discarding that cache, TensorMesh’s systems retain it, allowing it to be reused when the model runs a similar process on a separate query. Because GPU memory is so valuable, this may mean spreading data across different storage tiers, but the reward is significantly more inference power for the same server load.

The change is particularly powerful for the chat interface, as models must return to the ever-growing chat log as the conversation progresses. Agentic systems have a similar problem, with a growing log of actions and goals.

In theory, these changes could be carried out by AI companies on their own – but the technical complexity makes this a difficult task. Given the complexity of the TensorMesh team’s process research and detailing itself, the company is betting that the out-of-the-box product will be in high demand.

“Keeping the KV cache on a secondary storage system and reusing it efficiently without slowing down the entire system is a very challenging problem,” Jiang said. “We’ve seen people hire 20 engineers and spend three or four months building a system like this. Or they can use our product and do it very efficiently.”

Leave a Reply

Your email address will not be published. Required fields are marked *