AI coding tools are shifting to a surprising place: The terminal

Spread the love

For years, code-edges such as Carsar, Windsorf and Githabab Kapilot are standard for developing AI-powered software. However, as the agent AI becomes stronger, the coding of the vibe is stopped, how the AI systems interact with the software have changed a subtle shift.

Instead of working on the code, they are increasingly interacting directly with the shell they installed on any system.

The terminal is a very old-school way of running and data manipulating the most well-known programs as a black-white screen from the hacker films of the ’90s. This is not as visually impressive as the contemporary code editors, but if you know how to use it is a very strong interface. And code-based agents can write and debug the code, but from the written code, it is often necessary to get software to get software that can actually be used.

The clear sign of the shift in the terminal comes from the large labs. Since February, ethnographic, dipmind and openly have published all command-line coding equipment (CLOD COD, CLI and CLI Codex, respectively and they are among the most popular products of the company.

This shift was easy to miss, since they are working under the same branding as the previous coding equipment. However, at the bottom of the hood, how agents communicate with other computers online and offline have actually changed. Some believe that these changes are barely beginning.

“We have a big bet that there is a future where 95% LLM-Computer Interaction is through a terminal,” says Mike Merrill, the vice-architect of the top terminal-centric benchmark TerminalThe

Terminal-based equipment is also coming to their own way, such as prominent code-based tools are starting to look at. AI Code Editors have been shattered through conflicts of acquisition with Windsorf Senior Executives Hired by Google And the remaining agency Acquired by knowledge -Leaving the long -term future of the product product uncertain.

At the same time, new research suggests that programmers can consider productivity gain from conventional equipment. To study an METR Windsorf’s main contestant Cursor Pro has found that developers have assumed that they could finish the work 20% to 30% faster, the observation process was about 20% slow. In a nutshell, the code assistant actually spent the programmers’ time.

It has left an opening for companies like Warp, which is currently the top of the terminal-bench. Warp gives himself a “Agent Development Environment”, a medium ground between command-line equipment like IDE program and clode code.

However, Jach Lloyd, the founder of the Warp, is still a bullish in the terminal, seeing it as a way to address the problems that are out of the editor’s chance of a cursor.

“The terminal occupies a very low level in the development stack, so it is the most versatile place to be an ongoing agent,” Lloyd said.

To understand how the new method is different, it can be helpful to see the criteria used to measure them. The generation of code-based equipment focused on solving the Githab issues, the basis of the SWI-Bench examination. Each problem in the needle is an open problem of githab-a part of the code that does not work.

Models repeat the code until they do something that solves the problem. Integrated products like cursor have developed a more sophisticated approach for the problem, but the Githab/Sweb-Bench model still the main topic of how these tools reach the problem: starting with broken code and turning it into code which works.

Taking a larger view of the terminal-based equipment, a program is running out of the entire environment code, which includes coding, but more divop-based jobs include that a git server is configured or why the script is not the problem.

In A terminalbunch problemInstructions gives a decompression program and a target text file, challenges the agent to reverse the algorithm in a matching algorithm. Another Asked the agent to create the Linux kernel from the source, the agent failed to mention the source code itself. To solve problems, programmers need the ability to solve bull-head problems.

“What makes the terminalbench difficult is not just the question we are giving to agents,” said Alex Shaw, vice -ultative Alex Shaw, a terminal-bench. “It’s the environment that we’re putting them” “

Seriously, the meaning of this new method is to deal with a problem step by step-the same skill that makes the agent AI so strong. Even sophisticated agents models cannot manage all those environment. The Warp has earned its high score at the terminal-bench by solving more than half of the problem-a sign of how challenging the benchmark and the full potential of the terminal still need to be done.

Nevertheless, Lloyd believes that we are already at a stage where terminal-based equipment reliably can handle a developer’s non-coding work reliably-a price proposal that is hard to ignore.

“If you think of the daily work of setting up a new project, determining the dependers and running it, the warp can do it autonomously,” Lloyd said, “Lloyd said,” Lloyd said. “And if it can’t do it, why tell you why.”

Leave a ReplyCancel Reply

Trending now