Sign In
Sign In

Installing and Configuring a FastAPI Website on a VDS/VPS

Installing and Configuring a FastAPI Website on a VDS/VPS
Hostman Team
Technical writer
Python
16.07.2024
Reading time: 11 min

FastAPI is one of the most popular frameworks for building compact and fast HTTP servers in Python, released in 2018. It is built on several lower-level libraries:

  • Pydantic: A data validation library for Python.

  • Starlette: An ASGI (Asynchronous Server Gateway Interface) toolkit designed to support asynchronous functions in Python.

In this tutorial, we will explore how to manually deploy a web application created with FastAPI on a local or remote Unix machine. For this, we need several basic components:

  • Python: The programming language compiler.

  • FastAPI: A Python package.

  • Nginx: A web server with the appropriate configuration file.

  • Uvicorn: An ASGI server for Python.

  • Systemd: A system utility for managing running services.

Our web application's architecture will be as follows:

First, the Python code using the FastAPI package is run as an ASGI server via the Uvicorn web server. Then, Nginx is launched as a proxy server, which will forward all incoming requests to the already running Uvicorn server. Both servers, Uvicorn and Nginx, will be managed by the system utility Systemd. Nginx will handle user requests on port 80 (the standard port for HTTP protocol) and forward them to port 8000 (typically free for TCP/UDP connections) on the Uvicorn server with the FastAPI application.

To deploy this technology stack, we will need a cloud virtual server with the Ubuntu operating system.

Installing Python

First, check if Python is already installed on the system:

python3 --version

Next, update the list of available packages:

sudo apt update

Then install the latest version of Python and a few related dependencies: the package manager, a library for high-level types, and a module for creating virtual environments.

sudo apt install python3 python3-pip python3-dev python3-venv

Now, if you run Python, it should start the interpreter:

python3

To verify, enter some simple code and execute it:

print("Hello, Hostman")

The output in the console should be:

Hello, Hostman

Installing and Configuring the Nginx Server

In our example, Nginx will act as a reverse proxy server, receiving user requests and passing them to the Uvicorn ASGI server for the FastAPI application.

Installation

We have a detailed guide on how to install the Nginx web server on the Ubuntu operating system. 

First, update the list of repositories:

sudo apt update

Then, download and install Nginx:

sudo apt install nginx

Next, adjust the system firewall UFW (Uncomplicated Firewall) to allow HTTP connections on port 80:

sudo ufw allow 'Nginx HTTP'

Configuration

The Nginx configuration file, nginx.conf, is located in the /etc/nginx/ directory. We will completely overwrite its contents with minimal settings required to forward requests to FastAPI:

daemon on; # Nginx will run as a background service
worker_processes 2;
user www-data;

events {
	use epoll;
	worker_connections 1024;
}

error_log /var/log/nginx/error.log;

http {
	server_tokens off;
	include mime.types;
	charset utf-8;


	access_log logs/access.log combined;

	server {
		listen 80;
		server_name www.DOMAIN.COM DOMAIN.COM;

		# Replace DOMAIN.COM with your server's address
		# Or you can use localhost

		location / {
			proxy_pass http://127.0.0.1:8000; # The port should match the Uvicorn server port
			proxy_set_header Host $host; # Pass the Host header with the target IP and port of the server
			proxy_set_header X-Real-IP $remote_addr; # Pass the header with the user's IP address
			proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; # Pass the entire chain of addresses the request has passed through
		}
	}
}

Note that we have simplified the configuration file structure by avoiding the use of the /etc/nginx/sites-available/ and /etc/nginx/sites-enabled/ directories, as well as additional files from /etc/nginx/conf.d/. This minimal configuration is not only crucial for our example but also helps eliminate unnecessary server elements and improve server security and understanding.

To check the syntax in the configuration file, run the following command:

sudo nginx -t

To apply the new configuration, you need to restart the web server manually:

sudo systemctl restart nginx

For reference, there is another command that restarts Nginx, gracefully ending its processes:

sudo systemctl reload nginx

Creating a Simple FastAPI Application

Next, let's create a FastAPI app to use in this article.

Project directory

To start, we will create a directory for our FastAPI application under the system directory /var, which is recommended for hosting web server files:

mkdir /var/www/fastapp

Next, navigate into the newly created directory:

cd /var/www/fastapp

Virtual environment

We will now set up a local isolated Python virtual environment, which is why we installed the python3-venv package earlier:

python3 -m venv venv

To activate the environment, run the activation script that was created along with the other folders when you set up the virtual environment:

source venv/bin/activate

Installing FastAPI

With the virtual environment activated, we can install the FastAPI library and the Uvicorn ASGI server using the pip package manager:

pip install fastapi uvicorn

Now we can run a test of our application using Uvicorn. The host will be set to localhost on the standard port:

uvicorn main:app --host 127.0.0.1 --port 8000 --reload

Let’s break down this command:

  • --host 127.0.0.1: Specifies the local host IP address.

  • --port 8000: Sets a free port number for TCP/UDP connections, which is not the standard HTTP port 80.

  • main: Refers to the name of the module being run. By default, this is the name of the Python file.

  • app: Refers to the instance of the application created in the code.

  • --reload: Tells Uvicorn to automatically detect changes in the source files and restart the server. This flag should be used only during development.

The default configuration returns a JSON object with the message "Hello World." To verify, you can make a curl request:

curl -X "GET" "http://localhost:8000"

Here, the -X flag is equivalent to the longer --request form and specifies the HTTP request method as GET.

Application Code

Open the main.py file and replace its content with the code for our simple application:

from fastapi import FastAPI
from fastapi.responses import HTMLResponse
from fastapi.responses import JSONResponse

app = FastAPI()  # Create an instance of the application

# Root GET request handler with the app.get decorator

@app.get("/")
async def get_root():
	page = "<h1>Hello World!</h1>"  # Server response text
	return HTMLResponse(content=page)

# GET request handler for a simple page request with the app.get decorator

@app.get("/pages/{page_number}")
async def get_page(page_number):
	return JSONResponse({"page_number": page_number})

# GET request handler for a simple user request with the app.get decorator

@app.get("/members/{member_number}")
async def get_member(member_number):
	return JSONResponse({"member_number": member_number})

# POST request handler for a simple user logout request with the app.post decorator

@app.post("/logout/")
async def post_logout(member_number: int):
	return JSONResponse({"member_number": member_number, "status": "OK"})

Note that if you name your application instance differently, the Uvicorn command will also change accordingly. For example, if you name the application perfect_router, the command would look like this:

from fastapi import FastAPI
from fastapi.responses import HTMLResponse
from fastapi.responses import JSONResponse

perfect_router = FastAPI()

@perfect_router.get("/")
def path_root():
	page = <h1>Hello World!<1>"
	return HTMLResponse(content=page)

In this case, the server start command would be:

uvicorn main:perfect_router --host 127.0.0.1 --port 8000 --reload

Managing the Application with Systemd

Your FastAPI application should run continuously, handling incoming requests even after a system reboot. To achieve this, we will use the systemd process manager, which is built into Linux. This will make our FastAPI application a background service.

Create a systemd service configuration file:

sudo nano /etc/systemd/system/fastapp.service

The content of this file will be:

[Unit]
Description=WebServer on FastAPI
After=network.target
[Service]
User=USERNAME
Group=USERGROUP
WorkingDirectory=/var/www/fastapp
ExecStart=/var/www/fastapp/venv/bin/uvicorn main:app --host 127.0.0.1 --port 8000
Restart=always
[Install]
WantedBy=multi-user.target

Replace the following placeholders:

  • USERNAME: Your system’s username.

  • USERGROUP: The main user group name. If you do not have a specific group, you can omit the Group option.

  • /var/www/fastapp: The path to your FastAPI application.

  • /var/www/fastapp/venv: The path to your virtual environment.

To activate the new configuration file, reload systemd:

sudo systemctl daemon-reload

After this command, systemd will reload all configuration files from the /etc/systemd/system/ directory, making them available for starting and monitoring.

Start the new service using the name specified in the file:

sudo systemctl start fastapp

Note that the service name in the command corresponds to the filename in the systemd directory: fastapp.service.

To check the status of the running application, use:

sudo systemctl status fastapp

To enable the application to start automatically at system boot, run:

sudo systemctl enable fastapp

This command will configure systemd to start the FastAPI service on system startup.

(Optional) Using Supervisor Instead of Systemd

Supervisor is a process management system for Unix-like operating systems, including Linux. It is designed to monitor and manage running applications. Essentially, Supervisor is a more advanced alternative to Systemd, though it is not included in the system by default.

Advantages of Systemd:

  • Built-in: Comes pre-installed with the OS. No additional dependencies are needed.

  • User-Friendly: Easy to use as it can be managed like a system service.

Advantages of Supervisor:

  • User Management: Processes can be managed by any user, not just the root user.

  • Web Interface: Comes with a web-based interface for managing processes.

  • Distribution Compatibility: Works on any Linux distribution.

  • Process Flexibility: Offers more features for process management, such as grouping processes and setting priorities.

Installing Supervisor

To install Supervisor on your system, run the following command:

sudo apt install supervisor

After installation, Supervisor will run in the background and start automatically with the system. However, it is a good practice to ensure that the auto-start feature is enabled. We will use Systemd to enable Supervisor:

sudo systemctl enable supervisor

Then manually start Supervisor:

sudo systemctl start supervisor

Configuring the Application Service

Like with Systemd, we need to create a short configuration file for Supervisor to manage our Uvicorn server. This file will be placed in Supervisor’s configuration directory for service files. As with Systemd, we will name it fastapp:

sudo nano /etc/supervisor/conf.d/fastapp.conf

Here’s what the file should contain:

[program:fastapp]
command=/var/www/fastapp/venv/bin/uvicorn main:app --host 127.0.0.1 --port 8000
directory=/var/www/fastapp
user=USERNAME
autostart=true
autorestart=true
redirect_stderr=true
stdout_logfile=/var/www/fastapp/logs/fasterror.log

Let’s break down this configuration:

  • command: The command to run the Uvicorn application with the necessary flags and parameters.

  • user: The system user under which the application will be managed.

  • autostart: Automatically start the process.

  • autorestart: Automatically restart the process if it fails.

  • redirect_stderr: Redirects standard error output.

  • stdout_logfile: Path to the log file for output (including errors) of the running process. We specified a working directory where a logs folder will be created.

Since we have specified a directory for logs in the configuration file, we need to create it manually:

sudo mkdir /var/www/fastapp/logs/

Running the Application with Supervisor

After adding the new configuration file, Supervisor needs to read the configuration settings, just as Systemd does. Use the following command:

sudo supervisorctl reread

After updating the configuration, restart the Supervisor service to apply the changes:

sudo supervisorctl update

To check the status of the application managed by Supervisor, use the command with the service name specified in the configuration file:

sudo supervisorctl status fastapp

Conclusion

In this brief guide, we demonstrated how to deploy a FastAPI-based website on a remote Unix machine using NGINX and Uvicorn servers along with Systemd for process management.

Optionally, you can use the more advanced tool Supervisor for managing FastAPI web applications.

By following this tutorial, you have learned:

  • How to install Python and its core dependencies.

  • How to install and configure NGINX to forward user requests to the Uvicorn FastAPI handler.

  • How to install FastAPI.

  • How to create a simple Python application using FastAPI routers.

  • How to ensure the continuous operation of a FastAPI application as a background service using Systemd.

  • How to manage your application with a separate Supervisor service.

The application described in this tutorial is a basic example to explain the process of deploying a FastAPI application. In a real-world project, the toolkit might differ slightly, and tools like Kubernetes are often used for automated deployment and continuous integration/continuous deployment (CI/CD) processes.

Python
16.07.2024
Reading time: 11 min

Similar

Python

How to Create and Set Up a Telegram Chatbot

Chatbots are software programs that simulate communication with users. Today, we use them for a wide range of purposes, from simple directories to complex services integrated with CRM systems and payment platforms. People create bots for Telegram, Viber, Facebook Messenger, and other messaging platforms. Each platform has its own rules and capabilities—some lack payment integration, while others don't support flexible keyboards. This article focuses on user-friendly Telegram, which has a simple API and an active audience. In this article, we will cover: How to create a Telegram bot on your own When it's convenient to use chatbot builders for development How to integrate a chatbot with external services and APIs What is needed for the bot to function smoothly The key features of Aiogram, a popular Python library for chatbot development Creating a Telegram Chatbot Without Programming Skills Chatbot builders are becoming increasingly popular. These services allow you to create a bot using a simple "drag-and-drop" interface. No programming knowledge is required—you just build logic blocks like in a children's game. However, there are some drawbacks to using chatbot builders: Limited functionality. Most chatbot builders provide only a portion of Telegram API's capabilities. For example, not all of them allow integration with third-party services via HTTP requests. Those that do often have expensive pricing plans. Generic scenarios. The minimal flexibility of builders leads to chatbots that look and function similarly. Dependence on the service. If the platform goes offline or its pricing increases, you may have to migrate your bot elsewhere. Builders are useful for prototyping and simple use cases—such as a welcome message, answering a few questions, or collecting contact information. However, more complex algorithms require knowledge of variables, data processing logic, and the Telegram API. Even when using a builder, you still need to understand how to address users by name, how inline keyboards work, and how to handle bot states. Free versions of chatbot builders often come with limitations: They may include advertising messages. Some prevent integration with essential APIs. Others impose limits on the number of users. These restrictions can reduce audience engagement, making the chatbot ineffective. In the long run, premium versions of these builders can end up costing more than developing a bot from scratch and hosting it on your own server. If you need a chatbot to handle real business tasks, automate processes, or work with databases, builders are often not sufficient. In such cases, hiring a developer is a better solution. A developer can design a flexible architecture, choose optimal technologies, and eliminate technical constraints that might hinder the project's scalability. If you already have a prototype built with a chatbot builder, you can use its logic as a starting point for technical specifications. How to Create a Telegram Chatbot Now, let's discuss how to create a Telegram chatbot using Python. You’ll need basic knowledge of variables, conditional statements, loops, and functions in Python. To create chatbots, you can use a framework which is a set of tools, libraries, and ready-made solutions that simplify software development. You can work with the raw Telegram API and implement functionality using HTTP requests, but even for simple tasks, this approach requires writing thousands of lines of code. In this guide, we’ll use Aiogram, one of the most popular frameworks for building Telegram chatbots in Python. Step 1: Create a Virtual Environment for Your Project Using a virtual environment in any Python project is considered good practice. Additionally, chatbots are often deployed on cloud servers where dependencies need to be installed. A virtual environment makes it easy to export a list of dependencies specific to your project. Install the Python virtual environment: sudo apt install python3-venv -y Create a virtual Python environment in the working directory: python -m venv venv Activate the environment: source ./venv/bin/activate Step 2: Install Required Libraries Install the Aiogram framework using pip: pip install aiogram Add a library for working with environment variables. We recommend this method for handling tokens in any project, even if you don’t plan to make it public. This reduces the risk of accidentally exposing confidential data. pip install python-dotenv You can also install any other dependencies as needed. Step 3: Initialize Your Chatbot via BotFather This is a simple step, but it often causes confusion. We need to interact with a Telegram bot that will generate and provide us with a token for our project. Open Telegram and start a chat with @BotFather. Click the Start button. The bot will send a welcome message. Enter the following command: /newbot BotFather will ask for a name for your bot—this is what users will see in their chat list. Then, enter a username for your bot. It must be unique and end with "bot" (e.g., mycoolbot). Once completed, BotFather will create your chatbot, assign it a username, and provide you with a token. Keep your token secret. Anyone with access to it can send messages on behalf of your chatbot. If your token is compromised, immediately generate a new one via BotFather. Next, open a chat with your newly created bot and configure the following: Click the Edit button. Update the profile picture. Set a welcome message. Add a description. Configure default commands. Step 4: Store Your Token Securely Create an environment file named .env (this file has no name, only an extension). Add the following line: BOT_TOKEN = your_generated_token On Linux and macOS, you can quickly save the token using the following command: echo "BOT_TOKEN = your_generated_token" > .env Step 4: Create the Script In your working directory, create a file called main.py—this will be the main script for your chatbot. Now, import the following test code, which will send a welcome message to the user when they enter the /start command: import asyncio # Library for handling asynchronous code import os # Module for working with environment variables from dotenv import load_dotenv # Function to load environment variables from the .env file from aiogram import Bot, Dispatcher, Router # Import necessary classes from aiogram from aiogram.types import Message # Import Message class for handling incoming messages from aiogram.filters import CommandStart # Import filter for handling the /start command # Create a router to store message handlers router = Router() # Load environment variables from .env load_dotenv() # Handler for the /start command @router.message(CommandStart()) # Filter to check if the message is the /start command async def cmd_start(message: Message) -> None: # Retrieve the user's first name and last name (if available) first_name = message.from_user.first_name last_name = message.from_user.last_name or "" # If no last name, use an empty string # Send a welcome message to the user await message.answer(f"Hello, {first_name} {last_name}!") # Main asynchronous function to start the bot async def main(): # Create a bot instance using the token from environment variables bot = Bot(token=os.getenv("BOT_TOKEN")) # Create a dispatcher to handle messages dp = Dispatcher() # Include the router with command handlers dp.include_router(router) # Start the bot in polling mode await dp.start_polling(bot) # If the script is run directly (not imported as a module), # execute the main() function if __name__ == "__main__": asyncio.run(main()) The script is well-commented to help you understand the essential parts.If you don't want to dive deep, you can simply use Dispatcher and Router as standard components in Aiogram. We will explore their functionality later in this guide. This ready-made structure can serve as a solid starting point for any chatbot project. As you continue development, you will add more handlers, keyboards, and states. Step 5: Run and Test the Chatbot Now, launch your script using the following command: python main.py Now you can open a chat with your bot in Telegram and start interacting with it. Aiogram Framework v3.x Features Overview  You only need to understand a few key components and functions of Aiogram to create a Telegram chatbot. This section covers Aiogram v3.x, which was released on September 1, 2023. Any version starting with 3.x will work. While older projects using Aiogram 2.x still exist, version 2.x is now considered outdated. Key Components of Aiogram Bot The Bot class serves as the interface to the Telegram API. It allows you to send messages, images, and other data to users. bot = Bot(token=os.getenv("TOKEN")) You can pass the token directly when initializing the Bot class, but it's recommended to use environment variables to prevent accidental exposure of your bot token. Dispatcher The Dispatcher is the core of the framework. It receives updates (incoming messages and events) and routes them to the appropriate handlers. dp = Dispatcher() In Aiogram v3, a new structure with Router is used (see below), but the Dispatcher is still required for initialization and launching the bot. Router In Aiogram v3, handlers are grouped within a Router. This is a separate entity that stores the bot's logic—command handlers, message handlers, callback handlers, and more. from aiogram import Router router = Router() After defining handlers inside the router, developers register it with the Dispatcher: dp.include_router(router) Handling Commands The most common scenario is responding to commands like /start or /help. from aiogram import F from aiogram.types import Message @router.message(F.text == "/start") async def cmd_start(message: Message): await message.answer("Hello! I'm a bot running on Aiogram.") F.text == "/start" is a new filtering method in Aiogram v3. message.answer(...) sends a reply to the user. Handling Regular Messages To react to any message, simply remove the filter or define a different condition: @router.message() async def echo_all(message: Message): await message.answer(f"You wrote: {message.text}") In this example, the bot echoes whatever text the user sends. Inline Buttons and Keyboards from aiogram.types import InlineKeyboardButton, InlineKeyboardMarkup inline_kb = InlineKeyboardMarkup( inline_keyboard=[ [InlineKeyboardButton(text="Click me!", callback_data="press_button")] ] ) @router.message(F.text == "/buttons") async def show_buttons(message: Message): await message.answer("Here are my buttons:", reply_markup=inline_kb) When the user clicks the button, the bot receives callback_data="press_button", which can be handled separately: from aiogram.types import CallbackQuery @router.callback_query(F.data == "press_button") async def handle_press_button(callback: CallbackQuery): await callback.message.answer("You clicked the button!") await callback.answer() # Removes the "loading" animation in the chat Regular Buttons (Reply Keyboard) Regular buttons differ from inline buttons in that they replace the keyboard. The user immediately sees a list of available response options. These buttons are tracked by the message text, not callback_data. from aiogram.types import ReplyKeyboardMarkup, KeyboardButton, ReplyKeyboardRemove # Creating a reply keyboard reply_kb = ReplyKeyboardMarkup( keyboard=[ [ KeyboardButton(text="View Menu"), KeyboardButton(text="Place Order") ] ], resize_keyboard=True # Automatically adjusts button size ) # Handling the /start command and showing the reply keyboard @router.message(F.text == "/start") async def start_cmd(message: Message): await message.answer( "Welcome! Choose an action:", reply_markup=reply_kb ) # Handling "View Menu" button press @router.message(F.text == "View Menu") async def show_menu(message: Message): await message.answer("We have pizza and drinks.") # Handling "Place Order" button press @router.message(F.text == "Place Order") async def make_order(message: Message): await message.answer("What would you like to order?") # Command to hide the keyboard @router.message(F.text == "/hide") async def hide_keyboard(message: Message): await message.answer("Hiding the keyboard", reply_markup=ReplyKeyboardRemove()) Filters and Middlewares Filters Filters help define which messages should be processed. You can also create custom filters. from aiogram.filters import Filter # Custom filter to check if a user is an admin class IsAdmin(Filter): def __init__(self, admin_id: int): self.admin_id = admin_id async def __call__(self, message: Message) -> bool: return message.from_user.id == self.admin_id # Using the filter to restrict a command to the admin @router.message(IsAdmin(admin_id=12345678), F.text == "/admin") async def admin_cmd(message: Message): await message.answer("Hello, Admin! You have special privileges.") Middlewares Middlewares act as intermediary layers between an incoming request and its handler. You can use them to intercept, modify, validate, or log messages before they reach their respective handlers. import logging from aiogram.types import CallbackQuery, Message from aiogram.dispatcher.middlewares.base import BaseMiddleware # Custom middleware to log incoming messages and callbacks class LoggingMiddleware(BaseMiddleware): async def __call__(self, handler, event, data): if isinstance(event, Message): logging.info(f"[Message] from {event.from_user.id}: {event.text}") elif isinstance(event, CallbackQuery): logging.info(f"[CallbackQuery] from {event.from_user.id}: {event.data}") # Pass the event to the next handler return await handler(event, data) async def main(): load_dotenv() logging.basicConfig(level=logging.INFO) bot = Bot(token=os.getenv("BOT_TOKEN")) dp = Dispatcher() # Attaching the middleware dp.update.middleware(LoggingMiddleware()) dp.include_router(router) await dp.start_polling(bot) Working with States (FSM) in Aiogram 3 Aiogram 3 supports Finite State Machine (FSM), which is useful for step-by-step data collection (e.g., user registration, order processing). FSM is crucial for implementing multi-step workflows where users must complete one step before moving to the next. For example, in a pizza ordering bot, we need to ask the user for pizza size and delivery address, ensuring the process is sequential. We must save each step's data until the order is complete. Step 1: Declare States from aiogram.fsm.state import State, StatesGroup class OrderPizza(StatesGroup): waiting_for_size = State() waiting_for_address = State() These states define different stages in the ordering process. Step 2: Switch between states from aiogram.fsm.context import FSMContext @router.message(F.text == "/order") async def cmd_order(message: Message, state: FSMContext): # Create inline buttons for selecting pizza size size_keyboard = InlineKeyboardMarkup( inline_keyboard=[ [ InlineKeyboardButton(text="Large", callback_data="size_big"), InlineKeyboardButton(text="Medium", callback_data="size_medium"), InlineKeyboardButton(text="Small", callback_data="size_small") ] ] ) await message.answer( "What size pizza would you like? Click one of the buttons:", reply_markup=size_keyboard ) # Set the state to wait for the user to choose a size await state.set_state(OrderPizza.waiting_for_size) # Step 2: Handle button click for size selection @router.callback_query(OrderPizza.waiting_for_size, F.data.startswith("size_")) async def choose_size_callback(callback: CallbackQuery, state: FSMContext): # Callback data can be size_big / size_medium / size_small size_data = callback.data.split("_")[1] # e.g., "big", "medium", or "small" # Save the selected pizza size in the temporary state storage await state.update_data(pizza_size=size_data) # Confirm the button press (removes "loading clock" in Telegram's UI) await callback.answer() await callback.message.answer("Please enter your delivery address:") await state.set_state(OrderPizza.waiting_for_address) # Step 2a: If the user sends a message instead of clicking a button (in waiting_for_size state), # we can handle it separately. For example, prompt them to use the buttons. @router.message(OrderPizza.waiting_for_size) async def handle_text_during_waiting_for_size(message: Message, state: FSMContext): await message.answer( "Please select a pizza size using the buttons above. " "We cannot proceed without this information." ) # Step 3: User sends the delivery address @router.message(OrderPizza.waiting_for_address) async def set_address(message: Message, state: FSMContext): address = message.text user_data = await state.get_data() pizza_size = user_data["pizza_size"] size_text = { "big": "large", "medium": "medium", "small": "small" }.get(pizza_size, "undefined") await message.answer(f"You have ordered a {size_text} pizza to be delivered at: {address}") # Clear the state — the process is complete await state.clear() Notice how the temporary storage keeps track of user responses at each step. This storage is user-specific and does not require a database. The user progresses through a chain of questions, and at the end, the order details can be sent to an internal API.  Deploying the Bot: Running on a Server Let's go through two main deployment methods. Quick Method: Docker + Hostman App Platform This method does not require any system administration knowledge; the entire deployment process is automated. Additionally, it helps save costs. Follow these steps: Export all project dependencies to a requirements.txt file. Using a virtual environment is recommended to avoid pulling in libraries from the entire system. Run the following command in the project directory terminal: pip freeze > requirements.txt Add a deployment file to the project directory — Dockerfile. This file has no extension, just the name. Insert the following content: FROM python:3.11 WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . EXPOSE 9999 CMD ["python", "main.py"] Create a Git repository and push it to GitHub. You can use a minimal set of Git commands from our guide by running these commands in sequence. Add the environment variables file (.env) to .gitignore to prevent it from being exposed publicly. Go to the Hostman control panel, select the App platform section, and click Create app. Go to the Docker tab and select Dockerfile. Link your GitHub account or connect your Git repository via URL. Select the repository from the list after linking your GitHub account. Choose a configuration. Hostman Apps offers a configuration of 1 CPU x 3.3GHz, 1GB RAM, NVMe storage, which is ideal for simple text-based bots, projects with small inline keyboards, basic FSM logic, low-demand API requests, working with SQLite, or lightweight JSON files. This configuration can handle 50-100 users per minute. Add the bot token to environment variables. In the App settings, click + Add, enter BOT_TOKEN as the key, and paste the token obtained from BotFather as the value. Start the deployment and wait for it to complete. Once finished, the bot will be up and running. Standard Method: Ubuntu + systemd Export all project dependencies to the requirements.txt file. Run the following command in the Terminal while in the project directory: pip freeze > requirements.txt Create a cloud server in the Hostman panel with the desired configuration and Ubuntu OS. Transfer project files to the directory on the remote server. The easiest way to do this is using the rsync utility if you're using Ubuntu/MacOS: rsync -av --exclude="venv" --exclude=".idea" --exclude=".git" ./ root@176.53.160.13:/root/project Don’t forget to replace the server IP and correct the destination directory.  Windows users can use FileZilla to transfer files.  Connect to the server via SSH. Install the package for virtual environments: sudo apt install python3.10-venv Navigate to the project directory where you transferred the files. Create a virtual environment and install the dependencies: python -m venv venvsource venv/bin/activatepip install -r requirements.txt Test the bot functionality by running it: python main.py If everything works, proceed to the next step. Create the unit file /etc/systemd/system/telegram-bot.service: sudo nano /etc/systemd/system/telegram-bot.service Add the following content to the file: [Unit] Description=Telegram Bot Service After=network.target [Service] User=root WorkingDirectory=/root/project ExecStart=/root/proj/venv/bin/python /root/proj/main.py Restart=always RestartSec=5 [Install] WantedBy=multi-user.target WorkingDirectory — the project directory ExecStart — the command to start the chatbot in the format <interpreter> <full path to the file>. If using a virtual environment, the path to the interpreter will be as in the example. If working without venv, use /usr/local/bin/python3. Reload systemd and enable the service: sudo systemctl daemon-reloadsudo systemctl enable telegram-bot.servicesudo systemctl start telegram-bot.service Check the status of the service and view logs if necessary: sudo systemctl status telegram-bot.service If the bot is running correctly, the Active field should show active (running). View bot logs: sudo journalctl -u telegram-bot.service -f Manage the service with the following commands: Restart the bot: sudo systemctl restart telegram-bot.service Stop the bot: sudo systemctl stop telegram-bot.service Remove the service (if needed): sudo systemctl disable telegram-bot.servicesudo rm /etc/systemd/system/telegram-bot.servicesudo systemctl daemon-reload Conclusion Creating a Telegram chatbot in Python is a task that can be accomplished even without programming experience using bot builders. However, if you need flexibility and more options, it's better to master the aiogram framework and deploy your own project. This gives you full control over the code, the ability to enhance functionality, manage integrations, and avoid the limitations of paid plans. To run the bot in production, simply choose an appropriate configuration on the Hostman App Platform and set up automatic deployment. Pay attention to security by storing the token in an environment variable and encrypting sensitive data. In the future, you can scale the bot, add webhook support, integrate payment systems and analytics systems, and work with ML models if AI features are required.
12 March 2025 · 18 min to read
Microservices

Database Connection in Python, Go, and JavaScript

Databases are an essential part of almost any project today. Database interactions are especially familiar to system and database administrators, DevOps/SRE professionals, and software developers. While administrators typically deploy one or multiple database instances and configure the necessary connection parameters for applications, developers need to connect directly to the database within their code. This article explores how to connect to databases using different programming languages. Prerequisites We will provide examples for connecting to MySQL, PostgreSQL, Redis, MongoDB, and ClickHouse databases using Python, Go, and JavaScript. To follow this guide, you will need: A database deployed on a server or in the cloud. Installed environments for Python, Go, and JavaScript, depending on your application programming language. Additionally for Python: pip installed. Additionally for JavaScript: Node.js and npm installed. Database Connection in Python MySQL and Python For connecting to MySQL databases, we can use a Python driver called MySQL Connector. Install the driver using pip: pip install mysql-connector-python Initialize a new connection: Import the mysql.connector library and the Error class to handle specific connection errors. Create a function named create_connection, passing the database address (host), user name (user), and user password (password). To establish the connection, define a class called create_connection that receives the variable names containing the database connection details. import mysql.connector from mysql.connector import Error def create_connection(host_name, user_name, user_password): connection = None try: connection = mysql.connector.connect( host="91.206.179.29", user="gen_user", password="m-EE6Wm}z@wCKe" ) print("Successfully connected to MySQL Server!") except Error as e: print(f"The error '{e}' occurred") return connection def execute_query(connection, query): cursor = connection.cursor() try: cursor.execute(query) connection.commit() print("Query executed successfully") except Error as e: print(f"The error '{e}' occurred") connection = create_connection("91.206.179.29", "gen_user", "m-EE6Wm}z@wCKe") Run the script. If everything works correctly, you will see the "Successfully connected to MySQL Server!" message. If any errors occur, the console will display error code and description. Create a new table: Connect to the database using the connection.database class, specifying the name of the database. Note that the database should already exist. To create a table, initialize a variable create_table_query containing the SQL CREATE TABLE query. For data insertion, initialize another variable insert_data_query with the SQL INSERT INTO query. To execute each query, use the execute_query class, which takes the database connection string and the variable containing the SQL query. connection.database = 'test_db' create_table_query = """ CREATE TABLE IF NOT EXISTS users ( id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(100) NOT NULL, age INT NOT NULL ) """ execute_query(connection, create_table_query) insert_data_query = """ INSERT INTO users (name, age) VALUES ('Alice', 30), ('Bob', 25) """ execute_query(connection, insert_data_query) if connection.is_connected(): connection.close() print("Connection closed") Run the script. PostgreSQL and Python Python offers several plugins for connecting to PostgreSQL, but the most popular one is psycopg2, which we will use here. Psycopg2 is one of the most frequently used Python plugins for PostgreSQL connections. One of its key advantages is its support for multithreading which allows you to maintain the database connection across multiple threads. Install psycopg2 using pip (if not already installed): pip install psycopg2-binary Connect to PostgreSQL. Import the Python psycopg2 package and create a function create_new_conn, using the try block. Establish the connection with the psycopg2.connect function, which requires the database name, user name, password, and database address as input. To initialize the connection, use the create_new_conn() function. Here’s the full code example for connecting to a database: import psycopg2 from psycopg2 import OperationalError def create_new_conn(): conn_to_postgres = None while not conn_to_postgres: try: conn_to_postgres = psycopg2.connect( default_db="default_db", default_user="gen_user", password_for_default_user="PasswordForDefautUser9893#", db_address="91.206.179.128" ) print("The connection to PostgreSQL has been successfully established!") except OperationalError as e: print(e) return conn_to_postgres conn_to_postgres = create_new_conn() Run the script: python3 connect_to_postgres.py If successful, you will see the "The connection to PostgreSQL has been successfully established!" message. . Next, create a table named books, which will have three columns. Use the cursor class for SQL expressions, such as creating database objects. If the query involves adding or modifying data, you must call the conn_to_postgres.commit() function afterward to apply the changes. import psycopg2 from psycopg2 import OperationalError def create_new_conn(): conn_to_postgres = None while not conn_to_postgres: try: conn_to_postgres = psycopg2.connect( default_db="default_db", default_user="gen_user", password_for_default_user="PasswordForDefautUser9893#", db_address="91.206.179.128" ) except OperationalError as e: print(e) return conn_to_postgres conn_to_postgres = create_new_conn() cursor = conn_to_postgres.cursor() cursor.execute(""" CREATE TABLE books ( book_id INT PRIMARY KEY NOT NULL, book_name VARCHAR(255) NOT NULL, book_author VARCHAR(255) NOT NULL ) """) conn_to_postgres.commit() print("Table Created successfully") Run the script: python3 create_table.py Now, let’s run INSERT INTO to add a new line: cursor.execute(""" INSERT INTO books (book_id,book_name,book_author) VALUES (1, 'Long Walk to Freedom', 'Nelson_Mandela') """) The full code is below: import psycopg2 from psycopg2 import OperationalError def create_new_conn(): conn_to_postgres = None while not conn_to_postgres: try: conn_to_postgres = psycopg2.connect( default_db="default_db", default_user="gen_user", password_for_default_user="PasswordForDefautUser9893#", db_address="91.206.179.128" ) except OperationalError as e: print(e) return conn_to_postgres conn_to_postgres = create_new_conn() cursor = conn_to_postgres.cursor() cursor.execute(""" INSERT INTO books (book_id,book_name,book_author) VALUES (1, 'Long Walk to Freedom', 'Nelson_Mandela') """) conn_to_postgres.commit() conn_to_postgres.close() print("Data inserted successfully") Run the script: python3 insert-data.py Redis and Python Redis belongs to the class of NoSQL databases, where data is stored in memory rather than on hard drives. It uses a key-value format for data storage. Redis has a wide range of applications, from data storage and caching to serving as a message broker. We will use the redis-py (or simply redis) library for connecting to Redis. Install the Redis library using pip: pip install redis Connecting to a Redis instance: Use a try block structure for connection, specifying the function redis.StrictRedis where you provide the Redis address, port, and user password. import redis try: connect_to_redis_server = redis.StrictRedis( redis_db_host=91.206.179.128, redis_db_port=6379, redis_user_password='PasswordForRedis6379') print connect_to_redis_server connect_to_redis_server.ping() print 'Successfully connected to Redis Server!' except Exception as ex: print 'Error:', ex exit('Failed to connect to Redis server.') Run the script: python3 connect_to_redis.py If successful, you will see a message like "Successfully connected to Redis Server!". Unlike relational databases, Redis stores data in a key-value format. The key uniquely identifies the corresponding value. Use the set method to create a new record. The example below creates a record with the key City and the value Berlin: print('Create new record:', connect_to_redis_server.set("City", "Berlin")) Use the get method to retrieve the value associated with a key: print('Print record using record key:', connect_to_redis_server.get("City")) Use the delete method to remove a record by its key: print('Delete record with key:', connect_to_redis_server.delete("City")) The complete code fragment is below. import redis try: connect_to_redis_server = redis.StrictRedis( redis_db_host=91.206.179.128, redis_db_port=6379, redis_user_password='PasswordForRedis6379') print ('New record created:', connect_to_redis_server.set("City", "Berlin")) print ('Print created record using record key', connect_to_redis_server.get("City")) print ('Delete created record with key :', connect_to_redis_server.delete("City")) except Exception as ex: print ('Error:', ex) MongoDB and Python MongoDB is another widely used NoSQL database that belongs to the document-oriented category. Data is organized as JSON-like documents. To connect to a MongoDB database with Python, the recommended library is PyMongo, which provides a synchronous API. Install the PyMongo plugin: pip3 install pymongo Connect to MongoDB server using the following Python code. Import the pymongo module and use the MongoClient class to specify the database server address. To establish a connection to the MongoDB server, use a try block for error handling: import pymongo connect_to_mongo = pymongo.MongoClient("mongodb://91.206.179.29:27017/") first_db = connect_to_mongo["mongo-db1"] try: first_db.command("serverStatus") except Exception as e: print(e) else: print("Successfully connected to MongoDB Server!") connect_to_mongo.close() Run: python3 connect_mongodb.py If the connection is successfully established, the script will return the message: "Successfully connected to MongoDB Server!" Add data to MongoDB. To add data, you need to create a dictionary. Let's create a dictionary named record1, containing three keys: record1 = { "name": "Alex", "age": 25, "location": "London" } To insert the dictionary data, use the insert_one method in MongoDB. insertrecord = collection1.insert_one(record1) import pymongo connect_to_mongo = pymongo.MongoClient("mongodb://91.206.179.29:27017/") db1 = connect_to_mongo["newdb"] collection1 = db1["userdata"] record1 = { "name": "Alex", "age": 25, "location": "London" } insertrecord = collection1.insert_one(record1) print(insertrecord) Run the script: python3 connect_mongodb.py ClickHouse and Python ClickHouse is a columnar NoSQL database where data is stored in columns rather than rows. It is widely used for handling analytical queries. Install the ClickHouse driver for Python. There is a dedicated plugin for ClickHouse called clickhouse-driver. Install the driver using the pip package manager: pip install clickhouse-driver Connect to ClickHouse. To initialize a connection with ClickHouse, you need to import the Client class from the clickhouse_driver library. To execute SQL queries, use the client.execute function. You also need to specify the engine. For more details on supported engines in ClickHouse, you can refer to the official documentation. We'll use the default engine, MergeTree. Next, create a new table called users and insert two columns with data. To list the data to be added to the table, use the tuple data type. After executing the necessary queries, make sure to close the connection to the database using the client.disconnect() method. The final code will look like this: from clickhouse_driver import Client client = Client(host=91.206.179.128', user='root', password='P@$$w0rd123', port=9000) client.execute(''' CREATE TABLE IF NOT EXISTS Users ( id UInt32, name String, ) ENGINE = MergeTree() ORDER BY id ''') data = [ (1, 'Alice'), (2, 'Mary') ] client.execute('INSERT INTO Users (id, name) VALUES', data) result = client.execute('SELECT * FROM Users') for row in result: print(row) client.disconnect() Database Connection in Go Go is one of the youngest programming languages, developed in 2009 by Google.  It is widely used in developing microservice architectures and network utilities. For example, services like Docker and Kubernetes are written in Go. Go supports integrating all popular databases, including PostgreSQL, Redis, MongoDB, MySQL, ClickHouse, etc. MySQL and Go For working with the MySQL databases in Go, use the go-sql-driver/mysql driver. Create a new directory for storing project files and navigate into it: mkdir mysql-connect && cd mysql-connect Create a go.mod file to store the dependencies: go mod init golang-connect-mysql Download the MySQL driver using the go get command: go get -u github.com/go-sql-driver/mysql Create a new file named main.go. Specify the database connection details in the dsn variable: package main import ( "database/sql" "fmt" "log" _ "github.com/go-sql-driver/mysql" ) func main() { dsn := "root:password@tcp(localhost:3306)/testdb" db, err := sql.Open("mysql", dsn) if err != nil { log.Fatal(err) } defer db.Close() if err := db.Ping(); err != nil { log.Fatal(err) } fmt.Println("Successfully connected to the database!") query := "INSERT INTO users (name, age) VALUES (?, ?)" result, err := db.Exec(query, "Alex", 25) if err != nil { log.Fatal(err) } lastInsertID, err := result.LastInsertId() if err != nil { log.Fatal(err) } fmt.Printf("Inserted data with ID: %d\n", lastInsertID) } PostgreSQL and Go To connect to PostgreSQL, use the pq driver. Before installing the driver, let's prepare our environment. Create a new directory for storing the project files and navigate into it: mkdir postgres-connect && cd postgres-connect Since we will be working with dependencies, we need to create a go.mod file to store them: go mod init golang-connect-postgres Download the pq driver using the go get command: go get github.com/lib/pq Create a new file named main.go. In addition to importing the pq library, it is necessary to add the database/sql library as Go does not come with official database drivers by default. The database/sql library consists of general, independent interfaces for working with databases. It is also important to note the underscore (empty identifier) when importing the pq module: _ "github.com/lib/pq" The empty identifier is used to avoid the "unused import" error, as in this case, we only need the driver to be registered in database/sql. The fmt package is required to output data to the standard output stream, for example, to the console. To open a connection to the database, the sql.Open function is used, which takes the connection string (connStr) and the driver name (postgres). The connection string specifies the username, database name, password, and host address: package main import ( "database/sql" "fmt" "log" _ "github.com/lib/pq" ) func main() { connStr := "user=golang dbname=db_for_golang password=Golanguserfordb0206$ host=47.45.249.146 sslmode=disable" db, err := sql.Open("postgres", connStr) if err != nil { log.Fatal(err) } defer db.Close() err = db.Ping() if err != nil { log.Fatal(err) } fmt.Println("Successfully connected to PostgreSQL!") } Compile and run: go run main.go If everything works correctly, the terminal will display the message Successfully connected to PostgreSQL! Now, let's look at an example of how to insert data into a table.  First, we need to create a table in the database. When using Hostman cloud databases, you can copy the PostgreSQL connection string displayed in the "Connections" section of the Hostman web interface. Make sure that the postgresql-client utility is installed on your device beforehand. Enter the psql shell and connect to the previously created database: \c db_for_golang Create a table named Cities with three fields — city_id, city_name, and city_population: CREATE TABLE Cities ( city_id INT PRIMARY KEY, city_name VARCHAR(45) NOT NULL, city_population INT NOT NULL); Grant full privileges to the created table for the user: GRANT ALL PRIVILEGES ON TABLE cities TO golang; The function db.Prepare is used to prepare data. It specifies the query for insertion in advance. To insert data, use the function stmt.Exec. In Go, it's common to use plain SQL without using the ORM (Object-Relational Mapping) approach. stmt, err := db.Prepare("INSERT INTO Cities(city_id, city_name, city_population) VALUES($1, $2, $3)") if err != nil { log.Fatal(err) } defer stmt.Close() _, err = stmt.Exec(1, "Toronto", 279435) if err != nil { log.Fatal(err) } fmt.Println("Data inserted successfully!") } If all works correctly, you will see: Data inserted successfully! Redis and Go To connect to Redis, you need to use the go-redis driver. Сreate a new directory: mkdir connect-to-redis && cd connect-to-redis Prepare the dependency file: go mod init golang-connect-redis And optimize them: go mod tidy Download the go-redis module: go get github.com/go-redis/redis/v8 To connect to Redis, use the redis.Options function to specify the address and port of the Redis server. Since Redis does not use authentication by default, you can leave the Password field empty and use the default database (database 0): package main import ( "context" "fmt" "log" "github.com/go-redis/redis/v8" ) func main() { rdb := redis.NewClient(&redis.Options{ Addr: "91.206.179.128:6379", Password: "", DB: 0, }) ctx := context.Background() _, err := rdb.Ping(ctx).Result() if err != nil { log.Fatalf("Couldn't connect to Redis: %v", err) } fmt.Println("Successfully connected to Redis!") } You should see the message «Successfully connected to Redis!» MongoDB and Go To work with MongoDB, we'll use the mongo driver. Create a new directory to store the project structure: mkdir connect-to-mongodb && cd connect-to-mongodb Initialize the dependency file: go mod init golang-connect-mongodb Download the mongo library: go get go.mongodb.org/mongo-driver/mongo Connect to MongoDB using the options.Client().ApplyURI method. It takes a connection string such as mongodb://91.206.179.29:27017, where 91.206.179.29 is the MongoDB server address and 27017 is the port for connecting to MongoDB. The options.Client().ApplyURI string is used only for specifying connection data. To check the connection status, you can use another function, client.Ping, which shows the success or failure of the connection: package main import ( "context" "fmt" "log" "time" "go.mongodb.org/mongo-driver/mongo" "go.mongodb.org/mongo-driver/mongo/options" ) func main() { clientOptions := options.Client().ApplyURI("mongodb://91.206.179.29:27017") client, err := mongo.Connect(context.TODO(), clientOptions) if err != nil { log.Fatalf("Couldn't connect to MongoDB server: %v", err) } fmt.Println("successfully connected to MongoDB!") ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) defer cancel() err = client.Ping(ctx, nil) if err != nil { log.Fatalf("Could not ping MongoDB server: %v", err) } fmt.Println("Ping MongoDB server successfully!") } You should see the message: successfully connected to MongoDB!Ping MongoDB server successfully MongoDB uses collections to store data. You can create collections using the .Collection function.  Below, we will create a database called first-database and a collection called first-collection. The collection will have a new document, containing three keys: user-name, user-age, and user-email. collection := client.Database("first-database").Collection("first-collection") document := map[string]interface{}{ "user-name": "Alice", "user-age": 25, "user-email": "alice@corporate.com", } insertResult, err := collection.InsertOne(ctx, document) if err != nil { log.Fatalf("Couldn't insert new document: %v", err) } fmt.Printf("Inserted new document with ID: %v\n", insertResult.InsertedID) if err := client.Disconnect(ctx); err != nil { log.Fatalf("Could not disconnect from MongoDB: %v", err) } fmt.Println("Disconnected from MongoDB!") } If successful, you will see the Inserted new document message with the document ID.  ClickHouse and Go To work with ClickHouse, use the clickhouse-go driver. Create a new directory to store the project files and navigate to it: clickhouse-connect && cd clickhouse-connect Create a go.mod file to store the dependencies: go mod init golang-connect-clickhouse Download the Clickhouse driver using the command: go get github.com/ClickHouse/clickhouse-go/v2 Create a new file named main.go, where you will specify the connection data to ClickHouse. package main import ( "database/sql" "log" "github.com/ClickHouse/clickhouse-go/v2" ) func main() { dsn := "tcp://localhost:9000?username=user1&password=PasswordForuser175465&database=new_db" db, err := sql.Open("clickhouse", dsn) if err != nil { log.Fatal(err) } defer db.Close() if err := db.Ping(); err != nil { log.Fatal(err) } log.Println("Connected to ClickHouse!") } Database Connection in JavaScript In JavaScript, all connections to external services are made using the Node.js platform. Make sure that you have Node.js and the npm package manager installed on your device. MySQL and JavaScript To work with MySQL, use the mysql2 driver. Create a directory where we will store the project files: mkdir js-mysql-connect && cd js-mysql-connect Initialize the project: npm init -y Install the mysql2 library: npm install mysql2 Use the following code to connect to MySQL: const mysql = require('mysql2'); const connection_to_mysql = mysql.createConnection({ host: 'localhost', user: 'root', password: 'PasswordForRoot74463', database: db1, }); connection_to_mysql.connect((err) => { if (err) { console.error('Error connecting to MySQL:', err.message); return; } console.log('Successfully connected to MySQL Server!'); connection_to_mysql.end((endErr) => { if (endErr) { console.error('Error closing the connection_to_mysql:', endErr.message); } else { console.log('Connection closed.'); } }); }); PostgreSQL and JavaScript Connecting to PostgreSQL is done using the pg library. Create a directory where we will store the project files: mkdir js-postgres-connect && cd js-postgres-connect Initialize the project: npm init -y Install the pg library: npm install pg To connect to PostgreSQL, first import the pg library. Then, create a constant where you specify variables for the database address, username, password, database name, and port. Use the new pg.Client class to pass the connection data. We will create a table called cities and add two records into it. To do this, we will use the queryDatabase function, which contains the SQL queries. const pg = require('pg'); const config = { postgresql_server_host: '91.206.179.29', postgresql_user: 'gen_user', postgresql_user_password: 'PasswordForGenUser56467$', postgresql_database_name: 'default_db', postgresql_database_port: 5432, }; const client = new pg.Client(config); client.connect(err => { if (err) throw err; else { queryDatabase(); } }); function queryDatabase() { const query = ` DROP TABLE IF EXISTS cities; CREATE TABLE cities (id serial PRIMARY KEY, name VARCHAR(80), population INTEGER); INSERT INTO cities (name, population) VALUES ('Berlin', 3645000); INSERT INTO cities (name, population) VALUES ('Paris', 2161000); `; client .query(query) .then(() => { console.log('Table created successfully!'); client.end(console.log('Closed client connection')); }) .catch(err => console.log(err)) .then(() => { console.log('Finished execution, exiting now'); process.exit(); }); } Use this command to run the code: node connect-to-postgres.js Redis and JavaScript To work with Redis, use the ioredis library. Create a directory to store the project files: mkdir js-redis-connect && cd js-redis-connect Initialize the project: npm init -y Install the ioredis library: npm install ioredis To connect to Redis, import the ioredis library. Then create a constant named redis and specify the Redis server address. Inserting data, i.e., creating key-value objects, is done using an asynchronous function named setData, which takes two values — key and value, corresponding to the data format of the Redis system. const Redis = require('ioredis'); const redis = new Redis({ host: '91.206.179.29', port: 6379, password: 'UY+p8e?Kxmqqfa', }); async function setData(key, value) { try { await redis.set(key, value); console.log('Data successfully set'); } catch (error) { console.error('Error setting data:', error); } } async function getData(key) { try { const value = await redis.get(key); console.log('Data retrieved'); return value; } catch (error) { console.error('Error getting data:', error); } } (async () => { await redis.select(1); await setData('user', 'alex'); await getData('user'); redis.disconnect(); })(); Run: node connect-to-redis.js MongoDB and JavaScript To work with MongoDB, use the mongodb driver. Create a directory for storing the project files: mkdir js-mongodb-connect && cd js-mongodb-connect Initialize the project: npm init -y Install the mongodb library: npm install mongodb To connect to MongoDB, import the mongodb library. Specify the database address in the constant uri and pass the address into the MongoClient class. const { MongoClient } = require('mongodb'); const uri = "mongodb://91.206.179.29:27017"; const client = new MongoClient(uri, { useNewUrlParser: true, useUnifiedTopology: true }); async function connectToDatabase() { try { await client.connect(); console.log("Successfully connected to MongoDB!"); const database = client.db("myDatabase"); const collection = database.collection("myCollection"); const documents = await collection.find({}).toArray(); console.log("Documents found:", documents); } catch (error) { console.error("Error connecting to MongoDB:", error); } finally { await client.close(); console.log("Connection closed."); } } connectToDatabase(); ClickHouse and JavaScript To work with ClickHouse, use the clickhouse/client driver. Create a directory where we will store the project files: mkdir js-clickhouse-connect && cd js-clickhouse-connect Initialize the project: npm init -y Install the @clickhouse/client library: npm install @clickhouse/client To connect to ClickHouse, use the code below where we set the connection details and execute a simple SQL query that will return the first 10 records from the system table named system.tables: const { ClickHouse } = require('@clickhouse/client'); const client = new ClickHouse({ host: 'http://localhost:8123', username: 'default', password: 'PasswordforDefaultUser45435', database: 'default', }); async function connectAndQuery() { try { console.log('Successfully connected to ClickHouse Server!'); const rows = await client.query({ query: 'SELECT * FROM system.tables LIMIT 10', format: 'JSON', }).then((result) => result.json()); console.log('Query results:', rows); } catch (error) { console.error('Error Successfully connected to ClickHouse Server! or running the query:', error); } finally { console.log('Done.'); } } connectAndQuery(); Conclusion In today's article, we thoroughly explored how to connect to PostgreSQL, Redis, MongoDB, MySQL, and ClickHouse databases using Python, Go, and JavaScript. These languages can be used to create both web applications and microservices that utilize databases in their operation.
18 February 2025 · 23 min to read
Python

How to Parse HTML with Python

Parsing is the automatic search for various patterns (based on pre-defined structures) in text data sources to extract specific information. Although parsing is a broad term, it most commonly refers to the process of collecting and analyzing data from remote web resources. In the Python programming language, you can create programs for parsing data from external websites can using two key tools: Standard HTTP request package External HTML markup processing libraries However, data processing capabilities are not limited to just HTML documents. Thanks to a wide range of external libraries in Python, you can organize parsing for documents of any complexity, whether they are arbitrary text, popular markup languages (e.g., XML), or even rare programming languages. If there is no suitable parsing library available, you can implement it manually using low-level methods that Python provides by default, such as simple string searching or regular expressions. Although, of course, this requires additional skills. This guide will cover how to organize parsers in Python. We will focus on extracting data from HTML pages based on specified tags and attributes. We run all the examples in this guide using Python 3.10.12 interpreter on a Hostman cloud server with Ubuntu 22.04 and Pip 22.0.2 as the package manager. Structure of an HTML Document Any document written in HTML consists of two types of tags: Opening: Defined within less-than (<) and greater-than (>) symbols, e.g., <div>. Closing: Defined within less-than (<) and greater-than (>) symbols with a forward slash (/), e.g., </div>. Each tag can have various attributes, the values of which are written in quotes after the equal sign. Some commonly used attributes include: href: Link to a resource. E.g., href="https://hostman.com". class: The class of an object. E.g., class="surface panel panel_closed". id: Identifier of an object. E.g., id="menu". Each tag, with or without attributes, is an element (object) of the so-called DOM (Document Object Model) tree, which is built by practically any HTML interpreter (parser). This builds a hierarchy of elements, where nested tags are child elements to their parent tags. For example, in a browser, we access elements and their attributes through JavaScript scripts. In Python, we use separate libraries for this purpose. The difference is that after parsing the HTML document, the browser not only constructs the DOM tree but also displays it on the monitor. <!DOCTYPE html> <html> <head> <title>This is the page title</title> </head> <body> <h1>This is a heading</h1> <p>This is a simple text.</p> </body> </html> The markup of this page is built with tags in a hierarchical structure without specifying any attributes: html head title body h1 p Such a document structure is more than enough to extract information. We can parse the data by reading the data between opening and closing tags. However, real website tags have additional attributes that specify both the specific function of the element and its special styling (described in separate CSS files): <!DOCTYPE html> <html> <body> <h1 class="h1_bright">This is a heading</h1> <p>This is simple text.</p> <div class="block" href="https://hostman.com/products/cloud-server"> <div class="block__title">Cloud Services</div> <div class="block__information">Cloud Servers</div> </div> <div class="block" href="https://hostman.com/products/vps-server-hosting"> <div class="block__title">VPS Hosting</div> <div class="block__information">Cloud Infrastructure</div> </div> <div class="block" href="https://hostman.com/services/app-platform"> <div class="block__title">App Platform</div> <div class="block__information">Apps in the Cloud</div> </div> </body> </html> Thus, in addition to explicitly specified tags, the required information can be refined with specific attributes, extracting only the necessary elements from the DOM tree. HTML Data Parser Structure Web pages can be of two types: Static: During the loading and viewing of the site, the HTML markup remains unchanged. Parsing does not require emulating the browser's behavior. Dynamic: During the loading and viewing of the site (Single-page application, SPA), the HTML markup is modified using JavaScript. Parsing requires emulating the browser's behavior. Parsing static websites is relatively simple—after making a remote request, the necessary data is extracted from the received HTML document. Parsing dynamic websites requires a more complex approach. After making a remote request, both the HTML document itself and the JavaScript scripts controlling it are downloaded to the local machine. These scripts, in turn, usually perform several remote requests automatically, loading additional content and modifying the HTML document while viewing the page. Because of this, parsing dynamic websites requires emulating the browser’s behavior and user actions on the local machine. Without this, the necessary data simply won’t load. Most modern websites load additional content using JavaScript scripts in one way or another. The variety of technical implementations of modern websites is so large that they can’t be classified as entirely static or entirely dynamic. Typically, general information is loaded initially, while specific information is loaded later. Most HTML parsers are designed for static pages. Systems that emulate browser behavior to generate dynamic content are much less common. In Python, libraries (packages) intended for analyzing HTML markup can be divided into two groups: Low-level processors: Compact, but syntactically complex packages with a complicated implementation that parse HTML (or XML) syntax and build a hierarchical tree of elements. High-level libraries and frameworks: Large, but syntactically concise packages with a wide range of features to extract formalized data from raw HTML documents. This group includes not only compact HTML parsers but also full-fledged systems for data scraping. Often, these packages use low-level parsers (processors) from the first group as their core for parsing. Several low-level libraries are available for Python: lxml: A low-level XML syntax processor that is also used for HTML parsing. It is based on the popular libxml2 library written in C. html5lib: A Python library for HTML syntax parsing, written according to the HTML specification by WHATWG (The Web Hypertext Application Technology Working Group), which is followed by all modern browsers. However, using high-level libraries is faster and easier—they have simpler syntax and a wider range of functions: BeautifulSoup: A simple yet flexible library for Python that allows parsing HTML and XML documents by creating a full DOM tree of elements and extracting the necessary data. Scrapy: A full-fledged framework for parsing data from HTML pages, consisting of autonomous “spiders” (web crawlers) with pre-defined instructions. Selectolax: A fast HTML page parser that uses CSS selectors to extract information from tags. Parsel: A Python library with a specific selector syntax that allows you to extract data from HTML, JSON, and XML documents. requests-html: A Python library that closely mimics browser CSS selectors written in JavaScript. This guide will review several of these high-level libraries. Installing the pip Package Manager We can install all parsing libraries (as well as many other packages) in Python through the standard package manager, pip, which needs to be installed separately. First, update the list of available repositories: sudo apt update Then, install pip using the APT package manager: sudo apt install python3-pip -y The -y flag will automatically confirm all terminal prompts during the installation. To verify that pip was installed correctly, check its version: pip3 --version The terminal will display the pip version and the installation path: pip 22.0.2 from /usr/lib/python3/dist-packages/pip (python 3.10) As shown, this guide uses pip version 22.0.2. Installing the HTTP Requests Package Usually, the default Python interpreter includes the Requests package, which allows making requests to remote servers. We will use it in the examples of this guide. However, in some cases, it might not be installed. Then, you can manually install requests via pip: pip install requests If the system already has it, you will see the following message in the terminal: Requirement already satisfied: requests in /usr/lib/python3/dist-packages (2.25.1) Otherwise, the command will add requests to the list of available packages for import in Python scripts. Using BeautifulSoup To install BeautifulSoup version 4, use pip: pip install beautifulsoup4 After this, the library will be available for import in Python scripts. However, it also requires the previously mentioned low-level HTML processors to work properly. First, install lxml: pip install lxml Then install html5lib: pip install html5lib In the future, you can specify one of these processors as the core parser for BeautifulSoup in your Python code. Create a new file in your home directory: nano bs.py Add the following code: import requests from bs4 import BeautifulSoup # Request to the website 'https://hostman.com' response = requests.get('https://hostman.com') # Parse the HTML content of the page using 'html5lib' parser page = BeautifulSoup(response.text, 'html5lib') # Extract the title of the page pageTitle = page.find('title') print(pageTitle) print(pageTitle.string) print("") # Extract all <a> links on the page pageParagraphs = page.find_all('a') # Print the content of the first 3 links (if they exist) for i, link in enumerate(pageParagraphs[:3]): print(link.string) print("") # Find all div elements with a class starting with 'socials--' social_links_containers = page.find_all('div', class_=lambda c: c and c.startswith('socials--')) # Collect the links from these divs for container in social_links_containers: links = container.find_all('a', href=True) for link in links: href = link['href'] # Ignore links related to Cloudflare's email protection if href.startswith('/cdn-cgi/l/email-protection'): continue print(href) Now run the script: python bs.py This will produce the following console output: <title>Hostman - Cloud Service Provider with a Global Cloud Infrastructure</title> Hostman - Cloud Service Provider with a Global Cloud Infrastructure Partners Tutorials API https://wa.me/35795959804 https://twitter.com/hostman_com https://www.facebook.com/profile.php?id=61556075738626 https://github.com/hostman-cloud https://www.linkedin.com/company/hostman-inc/about/ https://www.reddit.com/r/Hostman_com/ Of course, instead of html5lib, you can specify lxml: page = BeautifulSoup(response.text, 'lxml') However, it is best to use the html5lib library as the processor. Unlike lxml, which is specifically designed for working with XML markup, html5lib has full support for modern HTML5 standards. Despite the fact that the BeautifulSoup library has a concise syntax, it does not support browser emulation, meaning it cannot dynamically load content. Using Scrapy The Scrapy framework is implemented in a more object-oriented manner. In Scrapy, website parsing is based on three core entities: Spiders: Classes that contain information about parsing details for specified websites, including URLs, element selectors (CSS or XPath), and page browsing mechanisms. Items: Variables for storing extracted data, which are more complex forms of Python dictionaries with a special internal structure. Pipelines: Intermediate handlers for extracted data that can modify items and interact with external software (such as databases). You can install Scrapy through the pip package manager: pip install scrapy After that, you need to initialize a parser project, which creates a separate directory with its own folder structure and configuration files: scrapy startproject parser Now, you can navigate to the newly created directory: cd parser Check the contents of the current directory: ls It has a general configuration file and a directory with project source files: parser scrapy.cfg Move to the source files directory: cd parser If you check its contents: ls You will see both special Python scripts, each performing its function, and a separate directory for spiders: __init__.py items.py middlewares.py pipelines.py settings.py spiders Let's open the settings file: nano settings.py By default, most parameters are commented out with the hash symbol (#). For the parser to work correctly, you need to uncomment some of these parameters without changing the default values specified in the file: USER_AGENT ROBOTSTXT_OBEY CONCURRENT_REQUESTS DOWNLOAD_DELAY COOKIES_ENABLED Each specific project will require a more precise configuration of the framework. You can find all available parameters in the official documentation. After that, you can generate a new spider: scrapy genspider hostmanspider hostman.com After running the above command, the console should display a message about the creation of a new spider: Created spider ‘hostmanspider' using template 'basic' in module: parser.spiders.hostmanspider Now, if you check the contents of the spiders directory: ls spiders You will see the empty source files for the new spider: __init__.py  __pycache__  hostmanspider.py Let's open the script file: nano spiders/hostmanspider.py And fill it with the following code: from pathlib import Path # Package for working with files import scrapy # Package from the Scrapy framework class HostmanSpider(scrapy.Spider): # Spider class inherits from the Spider class name = 'hostmanspider' # Name of the spider def start_requests(self): urls = ["https://hostman.com"] for url in urls: yield scrapy.Request(url=url, callback=self.parse) def parse(self, response): open("output", "w").close() # Clear the content of the 'output' file someFile = open("output", "a") # Create (or append to) a new file dataTitle = response.css("title::text").get() # Extract the title from the server response using a CSS selector dataA = response.css("a").getall() # Extract the first 3 links from the server response using a CSS selector someFile.write(dataTitle + "\n\n") for i in range(3): someFile.write(dataA[i] + "\n") someFile.close() You can now run the created spider with the following command: scrapy crawl hostmanspider Running the spider will create an output file in the current directory. To view the contents of this file, you can use: cat output The content of this file will look something like this: Hostman - Cloud Service Provider with a Global Cloud Infrastructure <a href="/partners/" itemprop="url" class="body4 medium nd-link-primary"><span itemprop="name">Partners</span></a> <a href="/tutorials/" itemprop="url" class="body4 medium nd-link-primary"><span itemprop="name">Tutorials</span></a> <a href="/api-docs/" itemprop="url" class="body4 medium nd-link-primary"><span itemprop="name">API</span></a> You can find more detailed information on extracting data using selectors (both CSS and XPath) can be found in the official Scrapy documentation. Conclusion Data parsing from remote sources in Python is made possible by two main components: A package for making remote requests Libraries for parsing data These libraries can range from simple ones, suitable only for parsing static websites, to more complex ones that can emulate browser behavior and, consequently, parse dynamic websites. In Python, the most popular libraries for parsing static data are: BeautifulSoup Scrapy These tools, similar to JavaScript functions (e.g., getElementsByClassName() using CSS selectors), allow us to extract data (attributes and text) from the DOM tree elements of any HTML document.
11 February 2025 · 13 min to read

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support