Evolution of Open-Source AI Agents

Technical writer

Infrastructure

08.09.2025

8 min read

The year 2025 has truly become the year of flourishing AI agents, and this trend continues to gain momentum. Not long ago, many were only discussing the concept, but today we can see real-world applications of AI agents actively being integrated into development processes. Of particular interest are open-source AI agents, which allow teams not only to use but also to adapt the technology to their own needs.

In this article, we will look at how these AI tools have evolved and how they help solve complex software engineering tasks. We’ll start with an overview of the early but important players, such as Devstral, and move on to more up-to-date AI agent applications available now.

Overview of the Open-Source AI Agent Landscape for Coding

The first noticeable steps toward open agents for development were made with models such as Devstral. Developed in collaboration between Mistral AI and All Hands AI, Devstral became a breakthrough solution.

Thanks to its lightweight architecture (only 24 billion parameters), it could run on a single Nvidia RTX 4090, making it accessible for local use. With a large context window of 128k tokens and an advanced tokenizer, it handled multi-step tasks in large codebases very well.

However, the AI world doesn’t stand still. Today, many new, more productive and functional agents have appeared. Among them, the following stand out:

OpenHands: One of the most popular open-source frameworks today. It provides a flexible platform for creating and deploying agents, allowing developers to easily integrate different LLMs for task execution.
Moatless Tools: A set of tools that expand agent capabilities, allowing them to interact with various services and APIs, making them especially effective for automating complex workflows.
Refact.ai: A full-fledged open-source AI assistant focusing on refactoring, code analysis, and test writing. It offers a wide range of functions to boost developer productivity.
SWE-agent and its mini version mini: Tools developed by researchers from Princeton and Stanford. SWE-agent allows LLMs, such as GPT-4o, to autonomously solve tasks in real GitHub repositories, demonstrating high efficiency. The compact mini version (just 100 lines of code) can solve 65% of tasks from the SWE-bench benchmark, making it a great choice for researchers and developers who need a simple yet powerful agent-building tool.

Each of these projects contributes to the development of agent-based coding, providing developers with powerful and flexible tools.

SWE-Bench: The Standard for Evaluating Agent Coding

To understand how effectively these agents work, a reliable evaluation system is necessary. This role is played by SWE-Bench, which has become the de facto standard for measuring LLM capabilities in software engineering.

The benchmark consists of 2,294 real GitHub issues taken from 12 popular Python repositories.

To improve evaluation accuracy, SWE-Bench Verified was created—a carefully curated subset of 500 tasks. These tasks were analyzed by professional developers and divided by complexity: 196 “easy” (less than 15 minutes to fix) and 45 “hard” (over an hour). A task is considered solved if the changes proposed by the model pass all unit tests successfully.

Originally, Devstral was among the leaders on SWE-Bench Verified among open-source models. For example, in May 2025, the OpenHands + Devstral Small 2505 combo successfully solved 46.8% of tasks.

But the AI-agent world is evolving incredibly fast. Just three months later, in August 2025, these results don’t even make the top ten anymore. The current leader, Trae.ai, shows an impressive 75.20% of solved tasks—a clear demonstration of how quickly these technologies are progressing.

Not Just Benchmarks, But Real Work

At first glance, it might seem that the only important metric for an AI agent is its performance on benchmarks like SWE-Bench. And of course, impressive numbers like those of Trae.ai speak volumes.

But in practice, when solving real tasks, functionality and workflow integration matter much more than raw percentages.

Modern AI agents are not just code-generating models. They’ve become true multi-tool assistants, capable of:

interacting with Git,
running tests,
analyzing logs,
even creating pull requests.

Still, they differ, and each has its strengths:

Devstral is great for multi-step tasks in large codebases. Its lightweight design and large context window make it valuable for local workflows.
OpenHands is less of an agent itself and more of a flexible platform for building and deploying agents tailored to specific needs, easily integrating different language models.
Refact.ai is a full-fledged assistant focusing on analysis, refactoring, and test writing, helping developers maintain high code quality.

And let’s not forget SaaS solutions that have been breaking revenue records since the start of the year: Replit, Bolt, Lovable, and others.

Ultimately, the choice of an agent depends on the specific task: do you need a tool for complex multi-step changes, a flexible platform to build your own solution, or an assistant to help with refactoring?

In the end, their main advantage is not just the ability to write code but their seamless integration into workflows, taking over routine and complex tasks.

Launching Your Own Agent

Let’s look at how to deploy one of the modern agents, OpenHands. We’ll use the Devstral model, since it remains one of the open-source models that can run on your own hardware.

Preparing the GPU Server

First, you will need a server. Choose a suitable configuration with a GPU (for example, NVIDIA A100) to ensure the necessary performance. After creating the server, connect to it via SSH.

Installing Dependencies

Update packages and install Docker, which will be used to run OpenHands. Example for Ubuntu:

sudo apt update && sudo apt install docker.io -y

Setting Up and Running OpenHands

We’ll use a prebuilt Docker image of OpenHands to simplify deployment:

docker pull docker.all-hands.dev/all-hands-ai/runtime:0.51-nikolaik
docker run -it --rm --pull=always \
    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.51-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v ~/.openhands:/.openhands \
    -p 0.0.0.0:3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
    docker.all-hands.dev/all-hands-ai/openhands:0.51

This command will launch OpenHands in a Docker container, accessible via your server’s address at port 3000. During startup, you’ll get a URL for the OpenHands web interface.

The option -p 0.0.0.0:3000:3000 means OpenHands will be accessible externally. By default, the web interface does not require login or password, so use caution.

Connecting to the Agent

Open in your browser:

https://SERVER-IP:3000

You’ll see this screen:

Installing the Language Model (LLM)

To function, the agent needs an LLM. OpenHands supports APIs from various providers such as OpenAI (GPT family), Anthropic (Claude family), Google Gemini, and others.

But since we’re using a GPU server, the model can be run locally. The OpenHands + Devstral Small combo is still a top open-source performer on SWE-Bench, so we’ll use that model.

First, install the model locally. The method depends on the service you’ll use to run it. The simplest option is via Hugging Face:

huggingface-cli download mistralai/Devstral-Small-2505 --local-dir mistralai/Devstral-Small-2505

You can run the model with Ollama, vLLM, or other popular solutions. In our case, we used vLLM:

vllm serve mistralai/Devstral-Small-2505 \
    --host 127.0.0.1 --port 8000 \
    --api-key local-llm \
    --tensor-parallel-size 2 \
    --served-model-name Devstral-Small-2505 \
    --enable-prefix-caching

Adding the Model to OpenHands

In the LLM settings of OpenHands, go to “see advanced settings” and fill in:

Custom model: mistralai/Devstral-Small-2505
Base URL: http://127.0.0.1:8000/v1 (depends on your service setup)
API Key: local-llm (may vary by setup)

The Future of Agent-Based Coding: More Than Just Autocompletion

The evolution from Devstral to platforms like OpenHands shows that we are moving from simple models to full-fledged autonomous tools.

LLM agents are no longer just “improved autocompletes”; they are real development assistants, capable of taking on routine and complex tasks. They can:

Implement features requiring changes across dozens of files.
Automatically create and run tests for new or existing code.
Perform refactoring and optimization at the project-wide level.
Interact with Git, automatically creating branches and pull requests.

Agents like Refact.ai are already integrating into IDEs, while OpenHands enables building a full AI-driven CI/CD pipeline.

The future points to a world where developers act more as architects and overseers, while routine work is automated with AI agents.

Infrastructure

08.09.2025

8 min read

Similar

Infrastructure

Evolution of Open-Source AI Agents

Overview of the Open-Source AI Agent Landscape for Coding

SWE-Bench: The Standard for Evaluating Agent Coding

Not Just Benchmarks, But Real Work

Launching Your Own Agent

The Future of Agent-Based Coding: More Than Just Autocompletion

Similar

Hybrid Cloud Computing: Architecture, Benefits, and Use Cases

Hypervisor: Types, Examples, Security, Comparison

Information Security (InfoSec): Definition, Principles Triad, and Threats

Do you have questions, comments, or concerns?

Do you have questions,
comments, or concerns?