Sign In
Sign In

K3s vs K8s: Key Differences and Use Cases

K3s vs K8s: Key Differences and Use Cases
Hostman Team
Technical writer
Infrastructure

As a popular orchestration container tool, Kubernetes (K8s) powers over 90% of global workloads, making it an important instrument in infrastructure development. However, its complexity and resource consumption aren't always good—especially for lightweight environments. That’s where K3s comes in, offering a streamlined alternative. 

In this guide we compare K3s and K8s, helping you choose the right fit for your use case—whether it's enterprise-scale deployments or low-footprint edge clusters.

Key Takeaways

  • K3s keeps it light. If you're working with limited resources—like a Raspberry Pi, old device, or just need something quick for testing—K3s is your friend. It's super easy to install, doesn't eat up much memory.

  • K8s brings the muscle. When you’re dealing with more complex systems that need to scale reliably across multiple nodes and stay highly available, standard Kubernetes (K8s) is built for that.

  • What’s the real difference? K3s is all about simplicity and speed. K8s is all about power and control. They both run Kubernetes under the hood—it’s just a matter of how much you need to customize and scale.

What is K8s?

K8s is short for Kubernetes. It is a widely used option in professional production environments for managing microservices-based and large-scale containerized workloads. Originally developed by Google and now maintained by the CNCF

Standard Kubernetes is a modular tool and needs some manual touches, before getting deployed, such as the API server, scheduler, controller manager, etc. While this architecture provides powerful control and flexibility, it also adds significant operational complexity.

K8s is an ideal instrument for big  teams with DevOps pipelines, and the resources to manage and maintain grand infrastructure. Let’s not forget that, K8s fully integrates well with cloud providers like AWS (via EKS), Google Cloud (via GKE), and Azure (via AKS), and supports a broad ecosystem of extensions and third-party tools for monitoring, logging, service meshes, and more.

Image1

Kubernetes (K8s) working scheme

Advantages and Disadvantages of K8s

Advantages

  • Full documentation and community support: From Helm charts to service meshes, there’s a huge toolbox and community behind it.

  • Advanced features: Load balancing, autoscaling, RBAC, pod disruption budgets—you name it.

  • Cloud-native integrations: Easily integrates with major cloud providers and lets you build hybrid or multi-cloud setups.

  • Flexibility: You have control over every layer of your infrastructure.

Disadvantages

  • Complex setup and maintenance: Requires expertise to install, configure, and operate effectively.

  • Resource-hungry: Demands a minimum of 2 GB RAM per node for smooth operation.

  • Operational overhead: Requires active monitoring, frequent updates, and manual scaling strategies if not managed via cloud services.

  • Steep learning curve: Can be overwhelming for teams without prior Kubernetes experience.

What is K3s?

Now enter K3s—Kubernetes’ leaner cousin. Built by Rancher Labs and now maintained by SUSE, K3s is everything you love about Kubernetes—but packed into a smaller, easier-to-manage package. This one’s designed to provide a fully functional Kubernetes experience, but at the same time, it’s trying to reduce resource consumption, so your device won’t suffer from extensive workload.

K3s is a single binary—usually under 100MB—that includes the Kubernetes control plane, container runtime (containerd), networking (via Flannel), ingress (via Traefik), and even a Helm controller. Out of the box, it uses SQLite for the cluster datastore, but you can also connect it to MySQL, PostgreSQL, or etcd if you want HA.

By default, K3s uses SQLite as its datastore instead of etcd, although it can be configured to use MySQL, PostgreSQL, or external etcd clusters for enhanced high availability. It also includes:

  • containerd as the default container runtime

  • Flannel as the Container Network Interface (CNI)

  • Traefik as the default ingress controller

  • A built-in Helm controller for deploying charts

Group 1321314238

Kubernetes (K3s) working scheme explained

Advantages and Disadvantages of K3s

Advantages

  • Lightweight and efficient: Minimal resource consumption allows it to run on devices with as little as 512 MB RAM.

  • Quick setup: You can deploy a single-node cluster with a one-line install script.

  • Ideal for development and edge scenarios: Works well in places where full-scale Kubernetes would be excessive or impractical.

  • Lower operational burden: Fewer moving parts make it easier to maintain.

  • Full Kubernetes compatibility: Supports standard manifests, kubectl, Helm charts, and the Kubernetes API.

Disadvantages

  • Limited out-of-the-box HA: Full HA requires external database setup and additional configuration.

  • Security trade-offs: Some enterprise-grade features are disabled by default to conserve resources.

  • Smaller ecosystem: Though growing, K3s has fewer prebuilt integrations and community add-ons compared to standard Kubernetes.

  • Not intended for large, multi-tenant environments: Better suited for simpler or single-purpose deployments.

K8s vs K3s

Both K8s and K3s developed CNCF-certified Kubernetes distributions. However, despite almost common origin and almost the same purposes, these two tools have very different functions and target different environments. K8s (Kubernetes) is a more complex container orchestration platform designed for vast ecosystems with the possibility of extension, and very suitable for large teams. It provides full control over networking, security, and infrastructure integrations. Sounds like an ideal solution for big companies, doesn't it?

On the other hand, K3s is specifically engineered to be more available for less powerful machines, and overall easier to use. The thing is, K3s doesn’t need the operational complexity of Kubernetes that much. It is possible by consolidating components into a single binary and including sensible defaults like containerd, Flannel, and Traefik. 

Let’s not forget that, the main difference between K3s and K8s is not in capability—K3s supports the full Kubernetes API—but in setup, performance, and operating environment. If your team has a need for fast deployment and minimal headache, K3s is a strong candidate. For those managing large-scale, mission-critical systems with complex architectures, K8s is a better fit.

Here’s a quick comparison to summarize the key differences:

Feature

K8s (Kubernetes)

K3s

Purpose

Enterprise workloads, production-scale

Edge, IoT, dev/test, low-resource

Installation

Manual, multi-component setup

One-line install, single binary

Resource Requirements

2 GB+ RAM per node

512 MB RAM minimum

Datastore

etcd (required)

SQLite by default (etcd optional)

Control Plane Architecture

Multiple processes and services

Combined into a single binary

Add-ons

User-installed

Includes containerd, Flannel, Traefik

Multi-tenancy

Yes

Limited; designed for single-tenant

HA Support

Built-in

Requires external database

Best for

Complex, large-scale deployments

Light, fast, and simple setups

How to Choose the Right Option for Your Needs

Before you decide what’s best for you, K3s or Kubernetes, you need to understand your goals and what tools you have to make your plans real.

If you are willing to build a scalable application that requires high resource consumption like networking, multi-tenancy, and enterprise integrations, K8s is the better option. It excels in large production clusters with diverse workloads and complex operational requirements.

However, if you are:

  • Working with resource-constrained edge devices
  • Running CI/CD pipelines where clusters are created and destroyed quickly
  • Developing proof-of-concept projects
  • Managing a single-purpose application or internal tool

Then K3s will likely serve your needs better.

Both solutions are good and can even be used in your work together! For example, you can use K8s in production and K3s for development and testing environments. Pretty cool, right? 

Conclusion

Kubernetes is a great tool for your modern infrastructure, but you need to be careful in choosing the right option for you. Depending on your needs, K3s or K8s can offer distinct advantages. K3s provides a more simplified solution ideal for fast-moving teams or if you don't have access to hi-tech infrastructure. K8s, on the other hand, remains the only solution for enterprises and big companies that need more advanced capabilities.

If you’re interested in using Kubernetes, check Hostman’s Kubernetes Service—it’s cheap and pretty powerful.

FAQ

What is the main difference between K3s and Kubernetes (K8s)?

K3s is a lightweight version of Kubernetes, designed for resource-constrained devices, packaging all core components into a single binary. Kubernetes is a full-featured platform suited for large-scale enterprise deployments.

Is K3s a full Kubernetes distribution?  

Yes, K3s is certified by the Cloud Native Computing Foundation as a conformant Kubernetes distribution, fully supporting Kubernetes APIs and tools.

Can I use K3s in production?  

Definitely! K3s is used in production across various sectors for edge computing, IoT, local development clusters, and continuous integration.

What are the system requirements for K3s vs. K8s?  

K3s requires only 512 MB of RAM and one virtual CPU, ideal for low-power environments. In contrast, standard Kubernetes requires at least 2 GB of RAM and more resources for its control plane.

Can I migrate from K3s to full Kubernetes later? 

Yes, migration is straightforward due to API compatibility, though you should account for differences in configurations.

When should I choose K3s over Kubernetes?  

Choose K3s for low-resource settings, quick deployments, or edge scenarios. Opt for Kubernetes for high availability, scalability, and enterprise-grade features.

Infrastructure

Similar

Infrastructure

Google AI Studio: Full Guide to Google’s AI Tools

Google AI Studio is a web platform from Google for working with neural networks. At the core of the service is the family of advanced multimodal generative models, Gemini, which can handle text, images, video, and other types of data simultaneously. The platform allows you to prototype applications, answer questions, generate code, and create images and video content. Everything runs directly in the browser—no installation is required. The main feature of Google AI Studio is versatility. Everything you need is in one place and works in the browser: you visit the site, write a query, and within seconds get results. The service allows users to efficiently leverage the power of Google Gemini for rapid idea testing, working with code or text. Additionally, Google AI Studio can be used not only for answering questions but also as a starting point for future projects. The platform provides all the necessary tools, and Google does not claim ownership of the generated content. You have access not only to a standard chat with generative AI but also to specialized models for generating media content, music, and applications. Let’s go through each in detail. Chat This is the primary workspace in Google AI Studio, where you work with prompts and configure the logic and behavior of your model. Chat Options At the top, there are tools for working with the chat itself. System Instruction The main configuration block, which defines the “personality,” role, goal, and limitations for the model. It is processed first and serves as a permanent context for the entire dialogue. The system instruction is the foundation of your chatbot. The field accepts text input. For maximum effectiveness, follow these principles: define the role (clearly state what the model is), define the task (explain exactly what the model should do), set the output format, establish constraints (prevent the model from going beyond its role). Example instruction: "You are a Senior developer who helps other developers understand project code. You provide advice and explain the logic of the code. I am a Junior who will ask for your help. Respond in a way I can understand, point out mistakes and gaps in the code with comments. Do not fully rewrite the code I send you—give advice instead." Show conversation with/without markdown formatting Displays text with or without markdown formatting. Get SDK Provides quick access to API code by copying chat settings into code. All model parameters from the site are automatically included. Share prompt Used to send a link to your dialogue with the AI. You must save the prompt before sharing. Save prompt Saves the prompt to your Google Drive. Compare mode A special interface that allows you to run the same prompt on different language models (or different versions of the same model) simultaneously and instantly see their responses side by side. It’s like parallel execution with a visual comparison. Clear chat Deletes all messages in the chat. Model Parameters In this window, you select the neural network and configure its behavior. Model Select the base language model. AI Studio provides the following options: Gemini 2.5 Pro: a “thinking” model capable of reasoning about complex coding, math, and STEM problems, analyzing large datasets, codebases, and documents using long context. Gemini 2.5 Flash: the best model in terms of price-to-performance, suitable for large-scale processing, low-latency tasks, high-volume reasoning, and agentic scenarios. Gemini 2.5 Flash-Lite: optimized for cost-efficiency and high throughput. Other available models include Gemini 2.0, Gemma 3, and LearnLM 2.0. More details about Gemini Pro, Flash, Flash-Lite, and others can be found in the official guide. Temperature: Controls the degree of randomness and creativity in the model’s responses. Higher values produce more diverse and unexpected answers, usually less precise. Lower values make responses more conservative and predictable. Media resolution: Refers to the level of detail in input media (images and video) that the model processes. Higher resolution allows Gemini to “see” and analyze more details, but requires more tokens for analysis. Thinking mode: Switches the model into a reasoning mode. The AI decomposes tasks and formulates instructions rather than outputting a result immediately. Set thinking budget: Limits the maximum number of tokens for the reasoning mode. Structured output: Allows developers and users to receive AI responses in predefined formats like JSON. You can specify the desired output format manually or via a visual editor. Grounding with Google Search: Enables Gemini to access Google Search in real-time for the most relevant and up-to-date information. Responses are based on search results rather than internal knowledge, reducing “hallucinations.” URL Context: Enhances grounding by allowing users to direct Gemini to specific URLs for context, rather than relying on general search. Stop sequences: Allows up to 5 sequences where the model will immediately stop generating text. Stream The Stream mode is an interactive interface for continuous dialogue with Gemini models. Supports microphone, webcam, and screen sharing. The AI can “see” and “hear” what you provide. Turn coverage: Configures whether the AI continuously considers all input or only during speech, simulating natural conversation including interruptions and interjections. Affective dialog: Enables AI to recognize emotions in your speech and respond accordingly. Proactive audio: When enabled, AI filters out background noise and irrelevant conversations, responding only when appropriate. Generate Media This section on the left panel provides interfaces for generating media: speech, images, music, and video. Gemini Speech Generator Converts text into audio with flexible settings. Use for video voice-overs, audio guides, podcasts, or virtual character dialogues. Tools include Raw Structure (scenario definition), Script Builder, Style Instructions, Add Dialog, Mode (monologue/dialogue), Model Settings, and Voice Settings. Main tools on the control panel: Raw Structure: Defines the scenario—how the request to the model for speech generation will be constructed. Script Builder: Instruction for dialogue with the ability to write lines and pronunciation style for each speaker. Style Instructions: Set the emotional tone and speech pace (for example: friendly, formal, energetic). Add Dialog: Add new lines and speakers. Mode: Choice between monologue and dialogue (up to 2 participants). Model Settings: Adjust model parameters, for example, temperature, which affects the creativity and unpredictability of speech. Voice Settings: Select a voice, adjust speed, pauses, pitch, and other parameters for each speaker. Image Generation A tool for generating images from a text description (prompt). Three models are available: Imagen 4 Imagen 4 Ultra Imagen 3 Imagen 4 and Imagen 4 Ultra can generate only one image at a time, while Imagen 3 can generate up to four images at once. To generate, enter a prompt for the image and specify the aspect ratio.  Music Generation A tool for interactive real-time music creation based on the Lyria RealTime model. The main feature is that you define the sound you want to hear and adjust its proportion. The more you turn up the regulator, the more intense the sound will be in the final track. You can specify the musical instrument, genre, and mood. The music updates in real time. Video Generation A tool for video generation based on Veo 2 and Veo 3 models (API only). Video length up to 8 seconds, 720p quality, 24 frames per second. Supports two resolutions—16:9 and 9:16. Video generation from an image: Upload a file and write a prompt. The resulting video will start from your image. Negative prompt support: Allows specifying what should not appear in the frame. This helps fine-tune the neural network’s output. App Generation Google AI Studio instantly transforms high-level concepts into working prototypes. To do this, go to the Build section. Describe the desired application in the prompt field and click Run. AI Studio will analyze this request and suggest a basic architecture, including necessary API calls, data structures, and interaction logic. This saves the developer from routine setup work on the initial project and allows focusing on unique functionality. The app generation feature relies on an extensive template library. Conclusion Google AI Studio has proven itself as a versatile platform for generative AI. It combines Gemini chat, multimodal text, image, audio, video generation, and app prototyping tools in one interface. The platform is invaluable for both developers and general users. Even the free tier of Google AI Studio covers most tasks—from content generation to MVP prototyping. Recent additions include Thinking Mode, Proactive Audio, and Gemini 2.5 Flash, signaling impressive future prospects.
10 September 2025 · 8 min to read
Infrastructure

Cloud vs Dedicated Server for E-commerce

If your online store is growing, sooner or later a key infrastructure question arises: cloud or dedicated server? Which one can be launched faster, which will survive peak loads without “crashes,” and how much will it cost with backups and administration? In this article, we will examine the key differences between the cloud and a dedicated server, ways of calculating the total cost of ownership (TCO), and typical bottlenecks in e-commerce: the database, cache, and static files. Cloud and Dedicated Server: Main Differences Let’s draw a simple analogy. The cloud is like a room in a hotel: you can move in quickly, request another room if necessary, cleaning and maintenance are included.  A dedicated server is like owning a house: it is completely yours, no one shares resources, but you need to take care of it yourself. How the Cloud Works You create a cloud server with the necessary configuration and can quickly change its parameters: increase memory, disk space, or add another server for web applications. Usually, this is accompanied by a flexible payment system—for example, in Hostman it is hourly.  The advantages are quick launch, scalability, convenient backups and snapshots. The disadvantages are that with excessive resources it is easy to overpay, and with round-the-clock high load, the cost may be higher than that of a dedicated server. How a Dedicated Server Works This is renting a physical server in a data center. The resources are entirely yours: CPU, memory, disks—without any “neighbors.”  The advantages are stable performance and a favorable price with constant heavy traffic. The disadvantages are slower scaling (waiting for an upgrade or migration), service downtime during failures may last longer, and administration of the server and organization of backups are entirely the responsibility of the client. What’s More Important for a Small Store You can launch an online store in the cloud today, in mere hours. When renting a dedicated server, allow time for its preparation: engineers need to assemble and test the configuration, especially if you ordered a custom one. Usually this takes a couple of days.  In the cloud, resources can be increased in a few clicks. On a dedicated server, the scaling process takes longer: you need to coordinate changes with engineers, wait for components, and install them in the data center. In some cases, it may require migration to a new server. Cloud offers many ready-made tools and automation. A dedicated server, as a rule, will require more manual configuration and regular involvement of an engineer. Money: if you have 20–300 orders per day and traffic “jumps,” the cloud is usually more profitable and quite suitable for solving such tasks. If orders are consistently high, 24/7, without sharp spikes, a dedicated server will be more reliable. In short: if you are just starting out, choose the cloud. When the load becomes consistently high, you can consider a dedicated server. Key Criteria When Choosing Infrastructure for an Online Store Let’s look at the key criteria to pay attention to when choosing between a cloud and a dedicated server. Speed of launch It is important for a business to launch in hours, not days. A cloud server and database are ready in just minutes. A dedicated server takes longer to prepare: on average, about an hour, and when ordering a custom configuration, it may take several days. Expenses Expenses in a small project can be calculated as the sum of these items:  Infrastructure: server, disks, traffic, IP, domains, CDN.  Reliability: backups and storing copies separately.  Administration: updates, monitoring, on-call duty.  Downtime: how much one hour of downtime costs (lost revenue + reputation). Peak loads Sometimes stores run sales, order promotions from bloggers, or it is simply the business season.  In the cloud, you can scale horizontally, setting up another VM, and vertically, by adding more vCPU and RAM.  To speed up images and static files loading, you can connect a CDN—this is equally available in the cloud and on a dedicated server.  With a dedicated server, you either have to pay for all the reserve capacity year-round, or request installation of additional modules—which, again, can take some time (hours or days, depending on component availability). Reliability and recovery There are two main parameters to consider when planning recovery time.  RTO: how much time the project can take to recover after downtime (goal: up to an hour).  RPO: how much data you are ready to lose during recovery (goal: up to 15 minutes, meaning that after the system is restored, you may lose only the data created in the last 15 minutes before the failure). Check: are backups scheduled, are copies stored outside the production server, will the system be able to recover automatically if production goes down. Security At a minimum, configure the site to work through an SSL certificate, set up multi-factor authentication in the control panel for administrators, and create a private network between the web server and the database. Performance Usually the bottlenecks of e-commerce are the database, cache, and images. To avoid problems when scaling, put images and videos in object storage, keep the database as a separate service, preferably with data replication. Monitor the response times of the cart and checkout pages—this is where sales most often fail if pages respond slowly. Growth and flexibility We recommend starting with a simple and reliable scheme: one cloud server + one separate database (DBaaS) + object storage for media. If you plan a sale, add another cloud server and a load balancer to distribute user traffic. Afterwards, return to the original scheme. Flexibility in this case may be more important than the “perfect” architecture at the start. Team competencies If there is no system administrator or developer in the team who can perform sysadmin functions, choose simple solutions: ready CMS images, managed DBs, automatic backups, built-in monitoring. The less manual work, the fewer risks. Building Reliable Infrastructure For a small store, a simple logic works: start with minimal but healthy architecture, and quickly increase capacity during sales. And just as quickly return to normal mode. Start with a clean cloud server on Ubuntu LTS, connect access via SSH keys, and disable password login. At the firewall level, leave only ports 80/443, the others are better disabled.  An alternative option is to use control panels (cPanel, FastPanel, etc.), where the stack is deployed “out of the box” and administration is available through a convenient graphical interface. Place the database separately and connect it to the application through a private network. This way it will not be accessible from the internet, and delays will be reduced. Create a separate DB user with minimal rights for the site, enable daily backups and store them outside the production environment. For sessions and cache use Redis: it will reduce load on the database and speed up product cards, search, and order processing. Transfer media files to object storage: CMS can easily be configured so that new uploads go to S3. On top of this, connect a CDN for images, JS, and CSS—this will provide a stable response speed for users from any region and relieve a significant load from web servers. Do not forget about Cache-Control and ETag headers: they will allow users’ browsers to keep static files longer in local cache, which speeds up site loading and reduces server load. Backups are part of the daily routine. For the database, make a daily full backup and several incremental points during the day, store copies for at least 30 days, and place them in another project or storage. Protect files and media with versioning in S3 and weekly server snapshots. Once a quarter perform a recovery “from scratch” on a clean machine to check your RTO and RPO. Monitoring allows you to reduce risks and prevent losses before failures occur. Monitor the response time for the cart and checkout, CPU load, and free disk space. Threshold values should be tied to your traffic: if response time goes down and CPU stays high, get ready to scale. A sales campaign should be prepared as carefully as a separate release. A day or two before launch make a snapshot and bring up a second machine, enable the load balancer, and check that sessions are in Redis so carts are not lost. Prepare the CDN in advance: open the most visited pages, product cards, and search results. Increase database resources in advance and check indexes on fields used for filtering and sorting. After the campaign ends, disable additional servers. Approach security issues without excessive measures, but consistently and systematically. In the store’s admin panel, enable multi-factor authentication and roles, on servers, prohibit SSH passwords, limit by IP, and use fail2ban against password brute force. To avoid overpaying, calculate infrastructure by roles: server, DB, S3 storage, CDN, snapshots and admin hours. Launch additional capacity only during peak days, and under normal load, plan infrastructure based on basic needs. Evaluate the cost of downtime: if it is higher than the cost of an additional server for a week, reserving resources for a promotion will be economically justified. Migration from a dedicated server to cloud hosting is safe if done in two phases. Prepare a copy of the infrastructure, place media files in S3 storage, and run the site on a test domain with regular DB synchronization. On migration day, freeze changes, make the final dump, lower TTL, and switch DNS. After switching, monitor metrics and logs, and keep the previous production environment in “read-only” mode for a day for emergency access. If you need size guidelines, think in terms of load.  Up to one hundred orders per day is usually enough with a server of 2 vCPU and 4–8 GB of memory, a separate DB of 1–2 vCPU and 2–4 GB, SSD of 60–120 GB, and a combination of S3+CDN with Redis.  With a load of 100–500 orders per day it is reasonable to use two cloud servers and a load balancer, a database with 2–4 vCPU and 8–16 GB, and if necessary, add a read replica.  With stable peak loads, the infrastructure is scaled to 2–3 cloud servers with 4–8 vCPU and 16 GB, a database with 4–8 vCPU and 32 GB, replication, and mandatory CDN.  These are starting points; further decisions are dictated by metrics. Conclusion There is no single correct answer in this subject. The choice between cloud and dedicated server depends on traffic, frequency of peaks, team competencies, and how much one hour of downtime costs you. It is important not to guess, but to rely on numbers and understand how quickly you can increase capacity and recover after a failure. If the store is small or growing, it is reasonable to start with the cloud: one server for the application, a separate DB, and object storage for media. Such a scheme can be launched in an evening, handles sales without long downtime, and does not force you to pay for “reserve” all year. The main thing is to immediately enable backups, configure a private network between the server and the DB, and have a scaling plan ready for sales days. When traffic becomes steady and high 24/7, and requirements for performance and integrations tighten, it makes sense to consider a dedicated server or hybrid. Often a combination works where the frontend application and static files remain in the cloud for flexibility, while the heavy DB or specific services move to “hardware.” The decision should be made not by preference, but by TCO, RTO/RPO, and load metrics.
09 September 2025 · 10 min to read
Infrastructure

Evolution of Open-Source AI Agents

The year 2025 has truly become the year of flourishing AI agents, and this trend continues to gain momentum. Not long ago, many were only discussing the concept, but today we can see real-world applications of AI agents actively being integrated into development processes. Of particular interest are open-source AI agents, which allow teams not only to use but also to adapt the technology to their own needs. In this article, we will look at how these AI tools have evolved and how they help solve complex software engineering tasks. We’ll start with an overview of the early but important players, such as Devstral, and move on to more up-to-date AI agent applications available now. Overview of the Open-Source AI Agent Landscape for Coding The first noticeable steps toward open agents for development were made with models such as Devstral. Developed in collaboration between Mistral AI and All Hands AI, Devstral became a breakthrough solution. Thanks to its lightweight architecture (only 24 billion parameters), it could run on a single Nvidia RTX 4090, making it accessible for local use. With a large context window of 128k tokens and an advanced tokenizer, it handled multi-step tasks in large codebases very well. However, the AI world doesn’t stand still. Today, many new, more productive and functional agents have appeared. Among them, the following stand out: OpenHands: One of the most popular open-source frameworks today. It provides a flexible platform for creating and deploying agents, allowing developers to easily integrate different LLMs for task execution. Moatless Tools: A set of tools that expand agent capabilities, allowing them to interact with various services and APIs, making them especially effective for automating complex workflows. Refact.ai: A full-fledged open-source AI assistant focusing on refactoring, code analysis, and test writing. It offers a wide range of functions to boost developer productivity. SWE-agent and its mini version mini: Tools developed by researchers from Princeton and Stanford. SWE-agent allows LLMs, such as GPT-4o, to autonomously solve tasks in real GitHub repositories, demonstrating high efficiency. The compact mini version (just 100 lines of code) can solve 65% of tasks from the SWE-bench benchmark, making it a great choice for researchers and developers who need a simple yet powerful agent-building tool. Each of these projects contributes to the development of agent-based coding, providing developers with powerful and flexible tools. SWE-Bench: The Standard for Evaluating Agent Coding To understand how effectively these agents work, a reliable evaluation system is necessary. This role is played by SWE-Bench, which has become the de facto standard for measuring LLM capabilities in software engineering. The benchmark consists of 2,294 real GitHub issues taken from 12 popular Python repositories. To improve evaluation accuracy, SWE-Bench Verified was created—a carefully curated subset of 500 tasks. These tasks were analyzed by professional developers and divided by complexity: 196 “easy” (less than 15 minutes to fix) and 45 “hard” (over an hour). A task is considered solved if the changes proposed by the model pass all unit tests successfully. Originally, Devstral was among the leaders on SWE-Bench Verified among open-source models. For example, in May 2025, the OpenHands + Devstral Small 2505 combo successfully solved 46.8% of tasks. But the AI-agent world is evolving incredibly fast. Just three months later, in August 2025, these results don’t even make the top ten anymore. The current leader, Trae.ai, shows an impressive 75.20% of solved tasks—a clear demonstration of how quickly these technologies are progressing. Not Just Benchmarks, But Real Work At first glance, it might seem that the only important metric for an AI agent is its performance on benchmarks like SWE-Bench. And of course, impressive numbers like those of Trae.ai speak volumes. But in practice, when solving real tasks, functionality and workflow integration matter much more than raw percentages. Modern AI agents are not just code-generating models. They’ve become true multi-tool assistants, capable of: interacting with Git, running tests, analyzing logs, even creating pull requests. Still, they differ, and each has its strengths: Devstral is great for multi-step tasks in large codebases. Its lightweight design and large context window make it valuable for local workflows. OpenHands is less of an agent itself and more of a flexible platform for building and deploying agents tailored to specific needs, easily integrating different language models. Refact.ai is a full-fledged assistant focusing on analysis, refactoring, and test writing, helping developers maintain high code quality. And let’s not forget SaaS solutions that have been breaking revenue records since the start of the year: Replit, Bolt, Lovable, and others. Ultimately, the choice of an agent depends on the specific task: do you need a tool for complex multi-step changes, a flexible platform to build your own solution, or an assistant to help with refactoring? In the end, their main advantage is not just the ability to write code but their seamless integration into workflows, taking over routine and complex tasks. Launching Your Own Agent Let’s look at how to deploy one of the modern agents, OpenHands. We’ll use the Devstral model, since it remains one of the open-source models that can run on your own hardware. Preparing the GPU Server First, you will need a server. Choose a suitable configuration with a GPU (for example, NVIDIA A100) to ensure the necessary performance. After creating the server, connect to it via SSH. Installing Dependencies Update packages and install Docker, which will be used to run OpenHands. Example for Ubuntu: sudo apt update && sudo apt install docker.io -y Setting Up and Running OpenHands We’ll use a prebuilt Docker image of OpenHands to simplify deployment: docker pull docker.all-hands.dev/all-hands-ai/runtime:0.51-nikolaik docker run -it --rm --pull=always \ -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.51-nikolaik \ -e LOG_ALL_EVENTS=true \ -v /var/run/docker.sock:/var/run/docker.sock \ -v ~/.openhands:/.openhands \ -p 0.0.0.0:3000:3000 \ --add-host host.docker.internal:host-gateway \ --name openhands-app \ docker.all-hands.dev/all-hands-ai/openhands:0.51 This command will launch OpenHands in a Docker container, accessible via your server’s address at port 3000. During startup, you’ll get a URL for the OpenHands web interface. The option -p 0.0.0.0:3000:3000 means OpenHands will be accessible externally. By default, the web interface does not require login or password, so use caution. Connecting to the Agent Open in your browser: https://SERVER-IP:3000 You’ll see this screen: Installing the Language Model (LLM) To function, the agent needs an LLM. OpenHands supports APIs from various providers such as OpenAI (GPT family), Anthropic (Claude family), Google Gemini, and others. But since we’re using a GPU server, the model can be run locally. The OpenHands + Devstral Small combo is still a top open-source performer on SWE-Bench, so we’ll use that model. First, install the model locally. The method depends on the service you’ll use to run it. The simplest option is via Hugging Face: huggingface-cli download mistralai/Devstral-Small-2505 --local-dir mistralai/Devstral-Small-2505 You can run the model with Ollama, vLLM, or other popular solutions. In our case, we used vLLM: vllm serve mistralai/Devstral-Small-2505 \     --host 127.0.0.1 --port 8000 \     --api-key local-llm \     --tensor-parallel-size 2 \     --served-model-name Devstral-Small-2505 \     --enable-prefix-caching Adding the Model to OpenHands In the LLM settings of OpenHands, go to “see advanced settings” and fill in: Custom model: mistralai/Devstral-Small-2505 Base URL: http://127.0.0.1:8000/v1 (depends on your service setup) API Key: local-llm (may vary by setup) The Future of Agent-Based Coding: More Than Just Autocompletion The evolution from Devstral to platforms like OpenHands shows that we are moving from simple models to full-fledged autonomous tools. LLM agents are no longer just “improved autocompletes”; they are real development assistants, capable of taking on routine and complex tasks. They can: Implement features requiring changes across dozens of files. Automatically create and run tests for new or existing code. Perform refactoring and optimization at the project-wide level. Interact with Git, automatically creating branches and pull requests. Agents like Refact.ai are already integrating into IDEs, while OpenHands enables building a full AI-driven CI/CD pipeline. The future points to a world where developers act more as architects and overseers, while routine work is automated with AI agents.
08 September 2025 · 8 min to read

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support