Sign In
Sign In

Evolution of Open-Source AI Agents

Evolution of Open-Source AI Agents
Hostman Team
Technical writer
Infrastructure

The year 2025 has truly become the year of flourishing AI agents, and this trend continues to gain momentum. Not long ago, many were only discussing the concept, but today we can see real-world applications of AI agents actively being integrated into development processes. Of particular interest are open-source AI agents, which allow teams not only to use but also to adapt the technology to their own needs.

In this article, we will look at how these AI tools have evolved and how they help solve complex software engineering tasks. We’ll start with an overview of the early but important players, such as Devstral, and move on to more up-to-date AI agent applications available now.

Overview of the Open-Source AI Agent Landscape for Coding

The first noticeable steps toward open agents for development were made with models such as Devstral. Developed in collaboration between Mistral AI and All Hands AI, Devstral became a breakthrough solution.

Thanks to its lightweight architecture (only 24 billion parameters), it could run on a single Nvidia RTX 4090, making it accessible for local use. With a large context window of 128k tokens and an advanced tokenizer, it handled multi-step tasks in large codebases very well.

However, the AI world doesn’t stand still. Today, many new, more productive and functional agents have appeared. Among them, the following stand out:

  • OpenHands: One of the most popular open-source frameworks today. It provides a flexible platform for creating and deploying agents, allowing developers to easily integrate different LLMs for task execution.

  • Moatless Tools: A set of tools that expand agent capabilities, allowing them to interact with various services and APIs, making them especially effective for automating complex workflows.

  • Refact.ai: A full-fledged open-source AI assistant focusing on refactoring, code analysis, and test writing. It offers a wide range of functions to boost developer productivity.

  • SWE-agent and its mini version mini: Tools developed by researchers from Princeton and Stanford. SWE-agent allows LLMs, such as GPT-4o, to autonomously solve tasks in real GitHub repositories, demonstrating high efficiency. The compact mini version (just 100 lines of code) can solve 65% of tasks from the SWE-bench benchmark, making it a great choice for researchers and developers who need a simple yet powerful agent-building tool.

Each of these projects contributes to the development of agent-based coding, providing developers with powerful and flexible tools.

SWE-Bench: The Standard for Evaluating Agent Coding

To understand how effectively these agents work, a reliable evaluation system is necessary. This role is played by SWE-Bench, which has become the de facto standard for measuring LLM capabilities in software engineering.

The benchmark consists of 2,294 real GitHub issues taken from 12 popular Python repositories.

To improve evaluation accuracy, SWE-Bench Verified was created—a carefully curated subset of 500 tasks. These tasks were analyzed by professional developers and divided by complexity: 196 “easy” (less than 15 minutes to fix) and 45 “hard” (over an hour). A task is considered solved if the changes proposed by the model pass all unit tests successfully.

Bac74648 D375 4551 A201 A4f8b43096ac.png

Originally, Devstral was among the leaders on SWE-Bench Verified among open-source models. For example, in May 2025, the OpenHands + Devstral Small 2505 combo successfully solved 46.8% of tasks.

But the AI-agent world is evolving incredibly fast. Just three months later, in August 2025, these results don’t even make the top ten anymore. The current leader, Trae.ai, shows an impressive 75.20% of solved tasks—a clear demonstration of how quickly these technologies are progressing.

Not Just Benchmarks, But Real Work

At first glance, it might seem that the only important metric for an AI agent is its performance on benchmarks like SWE-Bench. And of course, impressive numbers like those of Trae.ai speak volumes.

But in practice, when solving real tasks, functionality and workflow integration matter much more than raw percentages.

Modern AI agents are not just code-generating models. They’ve become true multi-tool assistants, capable of:

  • interacting with Git,
  • running tests,
  • analyzing logs,
  • even creating pull requests.

Still, they differ, and each has its strengths:

  • Devstral is great for multi-step tasks in large codebases. Its lightweight design and large context window make it valuable for local workflows.
  • OpenHands is less of an agent itself and more of a flexible platform for building and deploying agents tailored to specific needs, easily integrating different language models.
  • Refact.ai is a full-fledged assistant focusing on analysis, refactoring, and test writing, helping developers maintain high code quality.

And let’s not forget SaaS solutions that have been breaking revenue records since the start of the year: Replit, Bolt, Lovable, and others.

Ultimately, the choice of an agent depends on the specific task: do you need a tool for complex multi-step changes, a flexible platform to build your own solution, or an assistant to help with refactoring?

In the end, their main advantage is not just the ability to write code but their seamless integration into workflows, taking over routine and complex tasks.

Launching Your Own Agent

Let’s look at how to deploy one of the modern agents, OpenHands. We’ll use the Devstral model, since it remains one of the open-source models that can run on your own hardware.

  1. Preparing the GPU Server

First, you will need a server. Choose a suitable configuration with a GPU (for example, NVIDIA A100) to ensure the necessary performance. After creating the server, connect to it via SSH.

  1. Installing Dependencies

Update packages and install Docker, which will be used to run OpenHands. Example for Ubuntu:

sudo apt update && sudo apt install docker.io -y
  1. Setting Up and Running OpenHands

We’ll use a prebuilt Docker image of OpenHands to simplify deployment:

docker pull docker.all-hands.dev/all-hands-ai/runtime:0.51-nikolaik
docker run -it --rm --pull=always \
    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.51-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v ~/.openhands:/.openhands \
    -p 0.0.0.0:3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
    docker.all-hands.dev/all-hands-ai/openhands:0.51

This command will launch OpenHands in a Docker container, accessible via your server’s address at port 3000. During startup, you’ll get a URL for the OpenHands web interface.

The option -p 0.0.0.0:3000:3000 means OpenHands will be accessible externally. By default, the web interface does not require login or password, so use caution.

  1. Connecting to the Agent

Open in your browser:

https://SERVER-IP:3000

You’ll see this screen:

4785a251 Db13 4711 9b46 Afec311b69bb.png

  1. Installing the Language Model (LLM)

To function, the agent needs an LLM. OpenHands supports APIs from various providers such as OpenAI (GPT family), Anthropic (Claude family), Google Gemini, and others.

But since we’re using a GPU server, the model can be run locally. The OpenHands + Devstral Small combo is still a top open-source performer on SWE-Bench, so we’ll use that model.

First, install the model locally. The method depends on the service you’ll use to run it. The simplest option is via Hugging Face:

huggingface-cli download mistralai/Devstral-Small-2505 --local-dir mistralai/Devstral-Small-2505

You can run the model with Ollama, vLLM, or other popular solutions. In our case, we used vLLM:

vllm serve mistralai/Devstral-Small-2505 \
    --host 127.0.0.1 --port 8000 \
    --api-key local-llm \
    --tensor-parallel-size 2 \
    --served-model-name Devstral-Small-2505 \
    --enable-prefix-caching
  1. Adding the Model to OpenHands

In the LLM settings of OpenHands, go to “see advanced settings” and fill in:

  • Custom model: mistralai/Devstral-Small-2505
  • Base URL: http://127.0.0.1:8000/v1 (depends on your service setup)
  • API Key: local-llm (may vary by setup)

The Future of Agent-Based Coding: More Than Just Autocompletion

The evolution from Devstral to platforms like OpenHands shows that we are moving from simple models to full-fledged autonomous tools.

LLM agents are no longer just “improved autocompletes”; they are real development assistants, capable of taking on routine and complex tasks. They can:

  • Implement features requiring changes across dozens of files.
  • Automatically create and run tests for new or existing code.
  • Perform refactoring and optimization at the project-wide level.
  • Interact with Git, automatically creating branches and pull requests.

Agents like Refact.ai are already integrating into IDEs, while OpenHands enables building a full AI-driven CI/CD pipeline.

The future points to a world where developers act more as architects and overseers, while routine work is automated with AI agents.

Infrastructure

Similar

Infrastructure

Hybrid Cloud Computing: Architecture, Benefits, and Use Cases

A hybrid cloud is an infrastructure model that combines private and public cloud services. Private clouds are owned by the company, while public clouds rely on provider resources, such as Amazon Web Services (AWS), Microsoft Azure, or Hostman. Hybrid Cloud Architecture The architecture of a hybrid cloud consists of the company’s own data center, external resources, and private hosting. These components are connected through a unified management process. The key feature of the hybrid approach is the ability to connect systems that handle business-critical data, which cannot be placed on public infrastructure, while still leveraging the advantages of external hosting, such as on-demand scaling. Hybrid Cloud Advantages Hybrid cloud addresses the limitations of both public and private cloud services. It is a compromise solution with several important benefits: Reduced computing costs compared to relying solely on in-house hardware. Flexible management: critical data can remain on private infrastructure, while less sensitive workloads can be handled by the provider. Easy scalability by using resources offered by cloud providers. Disadvantages Some drawbacks of hybrid cloud include: Integration complexity: establishing a reliable connection between private and public environments can be challenging. Risk of failure: if resources are poorly distributed or one segment fails, the entire system may be affected. Oversubscription: some providers may allocate the same resources to multiple clients. Such issues can be avoided by carefully selecting a provider. For instance, when configuring a hybrid cloud on Hostman, you can rely on expert support and guaranteed access to the resources you pay for. Use Cases Here are several examples of situations where hybrid cloud infrastructure is particularly useful: Rapid Project Scaling Suppose you run an online store. During high-traffic events like Black Friday, website traffic spikes dramatically. Cloud architecture reduces the risk of server crashes during peak loads. Additional resources can be deployed in the cloud as needed and removed once demand decreases, preventing unnecessary costs. Scalability is also crucial for big data processing. Using cloud resources is more cost-effective than maintaining a large in-house data center. Data Segregation Confidential client information can be stored in a private cloud, while corporate applications run on public cloud infrastructure. Public hosting is also suitable for storing backup copies, ensuring business continuity if the primary system encounters problems. Development and Testing External cloud resources can be used for deployment and testing, allowing teams to simulate workloads and identify bugs not visible in local environments. After validation, the new version can be deployed to the main infrastructure. Conclusion Hybrid cloud is a practical approach for companies that value flexibility and aim for rapid growth. It combines the advantages of private and public hosting, enabling multiple use cases, from quickly deploying additional resources to securely storing sensitive data and testing new products.
21 October 2025 · 3 min to read
Infrastructure

Hypervisor: Types, Examples, Security, Comparison

A hypervisor is a process that helps separate the operating system and running applications from the hardware component. This typically refers to specialized software. However, embedded hypervisors also exist. These are available from the start, rather than being launched after system deployment. The hypervisor is what enables the development of the virtualization concept. Hardware virtualization is the ability to manage multiple virtual machines (VMs) on a single device. They become guest systems. An example of virtualization in use is renting a virtual server from a hosting provider. Multiple isolated spaces are located on one device. Different software can be installed on them. This increases resource utilization efficiency. Memory, computing power, and bandwidth are distributed among virtual servers rather than sitting idle waiting for load. Virtualization is not limited to servers. Storage hypervisors use it for data storage. They run on physical hardware as VMs, within the system, or in another storage network. Hypervisors also help virtualize desktops and individual applications. History of the Hypervisor Virtualization began being used in the 1960s. For the most part, the virtualization environment was applied to IBM mainframes. Developers used it to test ideas and to study and refine hardware concepts. This made it possible to deploy systems and fix errors without threats to the stability of the primary equipment. At the beginning of the new millennium, virtualization received a powerful boost thanks to widespread adoption in Unix family operating systems. There were several reasons for mass distribution: Server hardware capabilities improved. Architecture refinement led to increased reliability and security. Developers began implementing hardware virtualization on processors based on x86 architecture. This led to mass adoption. Since then, virtualization systems have been used not only for solving complex engineering tasks, but also for simple resource sharing and even home entertainment. In recent years, virtualization has expanded beyond x86 to ARM-based processors, with solutions like Apple's Virtualization framework and AWS Graviton instances becoming increasingly common. Advantages of Hypervisors Although virtual machines run on a single device, logical boundaries are built between them. This isolation protects against threats. If one virtual machine fails, others continue to operate. Another huge advantage is mobility. VMs are independent of hardware. Want to migrate an environment to another server? No problem. Need to deploy a VM on a local computer? Also a simple task. Less connection to hardware means fewer dependencies. Finally, resource savings. A hosting provider manages equipment more rationally by providing one physical server to multiple clients. Machines don't sit idle, but bring benefit with all their capabilities. Clients don't overpay for physical equipment while simultaneously gaining the ability to scale quickly and conveniently if such a need arises. Types of Hypervisors There are two types of hypervisors, concisely named Type 1 and Type 2. TYPE 1: bare-metal hypervisors. They run on the computer's hardware. From there, they manage the equipment and guest systems. This type of virtualization is offered by Xen, Microsoft Hyper-V, Oracle VM Server, and VMware ESXi. Modern cloud providers also use specialized Type 1 hypervisors like AWS Nitro and KVM-based solutions. TYPE 2: hosted hypervisors. They operate within the system as regular programs. Virtual systems in this case appear in the main system as processes. Examples include VirtualBox, VMware Workstation, VMware Player, and Parallels Desktop. To increase the stability, security, and performance of hypervisors, developers combine features of both types, creating hybrid solutions. They work both on "bare metal" and using the host's main system. Examples include recent versions of Xen and Hyper-V. The boundaries between bare-metal and hosted hypervisors are gradually blurring. However, it's still possible to determine the type. Though there's usually no practical need for this. Hypervisor Comparison Virtualization types are not the only difference. Hypervisors solve different tasks, have different hardware requirements, and have licensing peculiarities. Hyper-V A free hypervisor for servers running Windows OS. Its features: No graphical interface; configuration and debugging must be done in the console. Licenses must be purchased for all VMs running Windows. No technical support, although updates are released regularly. Hyper-V uses encryption to protect virtual machines and also allows reducing and expanding disk space. Among the disadvantages: there's no USB Redirection needed for connecting USB devices to virtual hosts. Instead, Discrete Device Assignment is used, which is not a complete replacement. VMware VMware is a virtualization technology created by the American company of the same name. It's used to organize virtual server operations. In 2024, Broadcom acquired VMware and introduced significant changes to licensing models and product portfolios, shifting focus toward larger enterprise customers. Many know about ESXi, a hardware hypervisor built on a lightweight Linux kernel called VMkernel. It contains all the necessary virtualization tools. A license must be purchased for each physical processor to operate. The amount of RAM and how many virtual machines you plan to run on your equipment doesn't matter. Note that under Broadcom's ownership, licensing models have evolved, with many standalone products being bundled into subscription packages. VMware has free virtualization tools. However, their capabilities are insufficient for professional use. For example, the API works in read-only mode, and the number of vCPUs must not exceed eight. Additionally, there are no backup management tools.  VMware Workstation The VMware Workstation hypervisor was created in 1999. Now it's a virtualization tool for x86-64 computers with Windows and Linux. The hypervisor supports over two hundred guest operating systems. VMware Hypervisor has a free version with reduced functionality, typically used for familiarization and testing. In 2024, Broadcom made VMware Workstation Pro free for personal use, making it more accessible to individual users and developers. KVM An open-source tool designed for Linux/x86-based servers. Intel-VT and AMD-V extensions are also supported, and ARM virtualization extensions are increasingly common. The KVM hypervisor is quite popular. It's used in many network projects: financial services, transportation systems, and even in the government sector. KVM is integrated into the Linux kernel, so it runs quickly. Major cloud providers use KVM as the foundation for their virtualization infrastructure. However, some disadvantages remain. Built-in services are not comparable in functionality to other hypervisors' solutions. To add capabilities, third-party solutions must be used, such as SolusVM or more modern management platforms like Proxmox VE. KVM is being refined by a community of independent developers, so gradually there are fewer shortcomings in its operation. The quality of the hypervisor is confirmed by hosting providers who choose it for virtualization on their equipment. Xen Xen is a cross-platform hypervisor solution that supports hardware virtualization and paravirtualization. It features minimal code volume. Modules are used to expand functionality. Open source code allows any specialist to modify Xen for their needs. Oracle VM VirtualBox Oracle VM VirtualBox is a cross-platform hypervisor for Windows, Linux, macOS, and other systems.  It is one of the most popular hypervisors, especially in the mass market segment. This is partly because VM VirtualBox has open source code. The program is distributed under the GNU GPL license. A distinctive feature: VirtualBox offers broad compatibility across different host and guest operating system combinations, making it ideal for development and testing environments. Hypervisors vs. Containers Hypervisors are often contrasted with containers. They allow deploying a greater number of applications on a single device. You already know what a hypervisor is and how it works. The problem is that VMs consume many resources. To operate, you need to make a copy of the operating system, plus a complete copy of the equipment for this system to function. If you allocate a nominal 4 GB of RAM to a VM, then the main device will have 4 GB less RAM. Unlike VMs, a container only uses the operating system's resources. It also needs power to run a specific application. But much less is required than to run an entire OS. However, containers cannot completely replace VMs. This is partly due to the increased risk of losing all data. Containers are located inside the operating system. If the host is attacked, all containers can be damaged or lost. A virtualization server creates multiple virtual machines. They don't interact with each other; there are clear boundaries between them. If one machine is attacked, the others remain safe. Along with all their contents. In modern infrastructure, containers and VMs are often used together. Container orchestration platforms like Kubernetes typically run on virtual machines, combining the isolation benefits of VMs with the efficiency of containers. This hybrid approach has become the standard for cloud-native applications. Security Issues Hypervisors are more secure than containers. However, they still have problems. Theoretically, it's possible to create a rootkit and malicious application that installs itself disguised as a hypervisor. Such a hack is called hyperjacking. It's difficult to detect. Protection doesn't trigger because the malicious software is already installed and intercepts system actions. The system continues to work, and the user doesn't even suspect there are problems. To protect the system from rootkits, specialists are developing various approaches that protect it without negatively affecting performance. Modern processors include hardware-based security features like Intel TXT and AMD Secure Encrypted Virtualization to help prevent hypervisor-level attacks. How to Choose a Hypervisor The choice is vast: VMware or VirtualBox, Hyper-V or KVM. There's one universal recommendation: focus on the tasks. If you need to test an operating system in a virtual machine on a home computer, VirtualBox will suffice. If you're looking for a solution to organize a corporate-level server network, then the focus shifts toward VMware tools (keeping in mind recent licensing changes), KVM-based solutions like Proxmox, or cloud-native options. For cloud deployments, consider managed hypervisor solutions from providers like Hostman, AWS, Azure, or Google Cloud, which abstract away much of the complexity while providing enterprise-grade performance and security.
20 October 2025 · 9 min to read
Infrastructure

Information Security (InfoSec): Definition, Principles Triad, and Threats

Information security refers to various methods of protecting information from outsiders. That is, from everyone who should not have access to it. For example, a marketer typically has no reason to view the company's financial statements, and an accountant doesn't need to see internal documents from the development department. Before the era of universal digitization, it was mainly paper documents that needed protection. They were hidden in safes, secret messages were encrypted, and information was transmitted through trusted people. Today, computer security is the foundation of any business. InfoSec Principles Information security protection is based on three principles: availability, integrity, and confidentiality. Confidentiality: data is received only by those who have the right to it. For example, application mockups are stored in Figma, with access limited to designers and the product manager. Integrity: data is stored in full and is not changed without permission from authorized persons. Suppose there's code in a private repository. If an unauthorized person gains access to the repository and deletes part of the project, this violates integrity. Availability: if an employee has the right to access information, they receive it. For example, every employee can access their email. But if the email service is attacked and made unavailable, employees won't be able to use it. Adhering to these principles helps achieve the goal of information security: to reduce the likelihood of or eliminate unauthorized access, modification, distribution, and deletion of data.  Many companies also adopt a zero-trust security approach that assumes no user or system should be trusted by default. This reinforces all three principles by requiring continuous verification. What Information Needs Protection Understanding what data should be protected is what information security in a company depends on. Information can be publicly accessible or confidential. Publicly accessible: this data can be viewed by anyone. Confidential: available only to specific users. At first glance, it seems that information security measures don't apply to publicly accessible information, but this isn't true. Only the principle of confidentiality doesn't apply to it. Publicly accessible data must remain integral and, logically, available. For example, a user's page on a social network. It contains publicly accessible information. The social network ensures its availability and integrity. If the user hasn't changed privacy settings, anyone can view their page. But they cannot change anything on it. At the same time, the account owner can configure confidentiality, for instance, hide their friends, groups they're subscribed to, and musical interests. Confidential information also comes in different types. These can be: Personal user data. Trade secrets: information about how the company operates and what projects it conducts and how. Professional secrets, which must be kept by doctors, lawyers, notaries, and representatives of certain other professions. Official secrets: for example, pension fund data, tax inspection information, banking details. State secrets: intelligence information, data on economic conditions, foreign policy, science and technology. This is not an exhaustive list, but rather an attempt to show how much data needs information security measures applied to it. Possible Threats The enormous list of potential threats is usually divided into four types: Natural: for example, hurricanes or floods. Man-made: phenomena related to human activity. They can be unintentional (employee error) or intentional (hacker attack). Internal: threats that originate from within the system, such as from employees. External: threats that originate from other sources, such as attacks by competitors. With the mass adoption of remote work formats, the number of man-made threats, both external and internal, intentional and unintentional, has noticeably increased. Because of this, the workload on information security specialists has grown. Today's threat environment includes several increasingly prevalent attack vectors: Ransomware attacks: malicious software that encrypts company data and demands payment for its release. These attacks have become more sophisticated and targeted, often crippling entire organizations. Supply chain attacks: compromising software or hardware providers to gain access to their customers' systems. Attackers exploit trust relationships between organizations and their vendors. AI-powered threats: artificial intelligence is being used to create more convincing phishing campaigns, generate deepfakes for social engineering attacks, and automate vulnerability discovery. At the same time, AI is also being deployed defensively to detect and respond to threats faster. Social engineering and deepfakes: attackers use AI-generated audio and video to impersonate executives or trusted individuals, making fraudulent requests appear legitimate. Protection Measures Organizational information protection measures are implemented at several control levels. Administrative: the formation of standards, procedures, and protection principles. For example, developing a corporate security policy. At this level, it's important to understand what data you will protect and how. Logical: protection of access to software and information systems. At this control level, access rights are configured, passwords are set, and secure networks and firewalls are configured. Physical: at this level, physical infrastructure is controlled. This refers not only to access to equipment, but also to protection from fires, floods, and other emergency situations. Despite digitization, physical information protection remains no less important. Antivirus software and access rights separation won't help if attackers gain physical access to the server. They won't save you in case of an emergency either. To eliminate such problems, Hostman uses infrastructure in protected data centers.
20 October 2025 · 5 min to read

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support