The year 2025 has truly become the year of flourishing AI agents, and this trend continues to gain momentum. Not long ago, many were only discussing the concept, but today we can see real-world applications of AI agents actively being integrated into development processes. Of particular interest are open-source AI agents, which allow teams not only to use but also to adapt the technology to their own needs.
In this article, we will look at how these AI tools have evolved and how they help solve complex software engineering tasks. We’ll start with an overview of the early but important players, such as Devstral, and move on to more up-to-date AI agent applications available now.
The first noticeable steps toward open agents for development were made with models such as Devstral. Developed in collaboration between Mistral AI and All Hands AI, Devstral became a breakthrough solution.
Thanks to its lightweight architecture (only 24 billion parameters), it could run on a single Nvidia RTX 4090, making it accessible for local use. With a large context window of 128k tokens and an advanced tokenizer, it handled multi-step tasks in large codebases very well.
However, the AI world doesn’t stand still. Today, many new, more productive and functional agents have appeared. Among them, the following stand out:
OpenHands: One of the most popular open-source frameworks today. It provides a flexible platform for creating and deploying agents, allowing developers to easily integrate different LLMs for task execution.
Moatless Tools: A set of tools that expand agent capabilities, allowing them to interact with various services and APIs, making them especially effective for automating complex workflows.
Refact.ai: A full-fledged open-source AI assistant focusing on refactoring, code analysis, and test writing. It offers a wide range of functions to boost developer productivity.
SWE-agent and its mini version mini: Tools developed by researchers from Princeton and Stanford. SWE-agent allows LLMs, such as GPT-4o, to autonomously solve tasks in real GitHub repositories, demonstrating high efficiency. The compact mini version (just 100 lines of code) can solve 65% of tasks from the SWE-bench benchmark, making it a great choice for researchers and developers who need a simple yet powerful agent-building tool.
Each of these projects contributes to the development of agent-based coding, providing developers with powerful and flexible tools.
To understand how effectively these agents work, a reliable evaluation system is necessary. This role is played by SWE-Bench, which has become the de facto standard for measuring LLM capabilities in software engineering.
The benchmark consists of 2,294 real GitHub issues taken from 12 popular Python repositories.
To improve evaluation accuracy, SWE-Bench Verified was created—a carefully curated subset of 500 tasks. These tasks were analyzed by professional developers and divided by complexity: 196 “easy” (less than 15 minutes to fix) and 45 “hard” (over an hour). A task is considered solved if the changes proposed by the model pass all unit tests successfully.
Originally, Devstral was among the leaders on SWE-Bench Verified among open-source models. For example, in May 2025, the OpenHands + Devstral Small 2505 combo successfully solved 46.8% of tasks.
But the AI-agent world is evolving incredibly fast. Just three months later, in August 2025, these results don’t even make the top ten anymore. The current leader, Trae.ai, shows an impressive 75.20% of solved tasks—a clear demonstration of how quickly these technologies are progressing.
At first glance, it might seem that the only important metric for an AI agent is its performance on benchmarks like SWE-Bench. And of course, impressive numbers like those of Trae.ai speak volumes.
But in practice, when solving real tasks, functionality and workflow integration matter much more than raw percentages.
Modern AI agents are not just code-generating models. They’ve become true multi-tool assistants, capable of:
Still, they differ, and each has its strengths:
And let’s not forget SaaS solutions that have been breaking revenue records since the start of the year: Replit, Bolt, Lovable, and others.
Ultimately, the choice of an agent depends on the specific task: do you need a tool for complex multi-step changes, a flexible platform to build your own solution, or an assistant to help with refactoring?
In the end, their main advantage is not just the ability to write code but their seamless integration into workflows, taking over routine and complex tasks.
Let’s look at how to deploy one of the modern agents, OpenHands. We’ll use the Devstral model, since it remains one of the open-source models that can run on your own hardware.
Preparing the GPU Server
First, you will need a server. Choose a suitable configuration with a GPU (for example, NVIDIA A100) to ensure the necessary performance. After creating the server, connect to it via SSH.
Installing Dependencies
Update packages and install Docker, which will be used to run OpenHands. Example for Ubuntu:
sudo apt update && sudo apt install docker.io -y
Setting Up and Running OpenHands
We’ll use a prebuilt Docker image of OpenHands to simplify deployment:
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.51-nikolaik
docker run -it --rm --pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.51-nikolaik \
-e LOG_ALL_EVENTS=true \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands:/.openhands \
-p 0.0.0.0:3000:3000 \
--add-host host.docker.internal:host-gateway \
--name openhands-app \
docker.all-hands.dev/all-hands-ai/openhands:0.51
This command will launch OpenHands in a Docker container, accessible via your server’s address at port 3000. During startup, you’ll get a URL for the OpenHands web interface.
The option -p 0.0.0.0:3000:3000
means OpenHands will be accessible externally. By default, the web interface does not require login or password, so use caution.
Connecting to the Agent
Open in your browser:
https://SERVER-IP:3000
You’ll see this screen:
Installing the Language Model (LLM)
To function, the agent needs an LLM. OpenHands supports APIs from various providers such as OpenAI (GPT family), Anthropic (Claude family), Google Gemini, and others.
But since we’re using a GPU server, the model can be run locally. The OpenHands + Devstral Small combo is still a top open-source performer on SWE-Bench, so we’ll use that model.
First, install the model locally. The method depends on the service you’ll use to run it. The simplest option is via Hugging Face:
huggingface-cli download mistralai/Devstral-Small-2505 --local-dir mistralai/Devstral-Small-2505
You can run the model with Ollama, vLLM, or other popular solutions. In our case, we used vLLM:
vllm serve mistralai/Devstral-Small-2505 \
--host 127.0.0.1 --port 8000 \
--api-key local-llm \
--tensor-parallel-size 2 \
--served-model-name Devstral-Small-2505 \
--enable-prefix-caching
Adding the Model to OpenHands
In the LLM settings of OpenHands, go to “see advanced settings” and fill in:
http://127.0.0.1:8000/v1
(depends on your service setup)local-llm
(may vary by setup)The evolution from Devstral to platforms like OpenHands shows that we are moving from simple models to full-fledged autonomous tools.
LLM agents are no longer just “improved autocompletes”; they are real development assistants, capable of taking on routine and complex tasks. They can:
Agents like Refact.ai are already integrating into IDEs, while OpenHands enables building a full AI-driven CI/CD pipeline.
The future points to a world where developers act more as architects and overseers, while routine work is automated with AI agents.