Sign In
Sign In

How to Read Excel Files in Python using Pandas

How to Read Excel Files in Python using Pandas
Kolawole Mangabo
Technical writer
Python
03.10.2024
Reading time: 8 min

Excel files are commonly used to organize, sort, and analyze data in a tabular format with rows and columns. They are widely applied in industries like data analysis, finance, and reporting.

Using Python, the pandas library allows for efficient manipulation of Excel files, enabling operations like reading and writing data. This article will cover how to use the read_excel function from pandas to read Excel files.

Installing Pandas

To begin, install pandas by running the following command:

pip install pandas

This will install pandas along with the required dependencies in your work environment. Additionally, the openpyxl module is needed for reading .xlsx files.

Why OpenPyXL?

Excel files come in different formats and extensions. To ensure compatibility when working with these files, pandas allows you to specify the engine you want to use. Below is a list of supported engines for reading Excel files:

  • OpenPyXL: Used for reading and writing .xlsx files (Excel 2007+).
  • XlsxWriter: Primarily used for writing .xlsx files.
  • xlrd: Used for reading older .xls files (Excel 97-2003).
  • Pyxlsb: Used for reading .xlsb (binary Excel format) files.

OpenPyXL also supports Excel-specific features, such as formatting and formulas. OpenPyXL is already installed as a dependency of pandas, but you can install it using the following command:

pip install openpyxl

While OpenPyXL can be used on its own to read Excel files, it is also integrated as an engine within pandas for reading and writing .xlsx files.

We will work with an Excel file that you can download here. Download the file and move it into your working environment.

Basic Usage of read_excel Function

The Excel file we are working with has the following structure:

Image1

It also has three worksheets: Orders, Returns, and Users.

To read this file, the read_excel function from pandas will be used.

The read_excel function in pandas is used to import data from Excel files into a pandas DataFrame, a powerful structure for analyzing and manipulating data. This function is highly versatile, allowing users to read data from specific sheets, columns, or ranges.

Here is how to use this function while specifying the engine:

import pandas as pd 

df = pd.read_excel('SuperStoreUS-2015.xlsx')

print(df)

This code imports the pandas library and uses the read_excel function to read the SuperStoreUS-2015.xlsx Excel file into a pandas DataFrame. The print(df) statement outputs the DataFrame contents, displaying the data from the Excel file. Below is the resulting output:

       Row ID Order Priority  Discount  Unit Price  Shipping Cost  ...  Ship Date     Profit Quantity ordered new    Sales Order ID

0      20847           High      0.01        2.84           0.93  ... 2015-01-08     4.5600                    4    13.01    88522

1      20228  Not Specified      0.02      500.98          26.00  ... 2015-06-15  4390.3665                   12  6362.85    90193

2      21776       Critical      0.06        9.48           7.29  ... 2015-02-17   -53.8096                   22   211.15    90192

3      24844         Medium      0.09       78.69          19.99  ... 2015-05-14   803.4705                   16  1164.45    86838

4      24846         Medium      0.08        3.28           2.31  ... 2015-05-13   -24.0300                    7    22.23    86838

The read_excel function is highly flexible and can be adapted to various usage scenarios. Next, we will explore how to use it for reading specific sheets and columns.

Reading Specific Sheets and Columns

Excel files can come with multiple sheets and as many columns as possible. The read_excel function takes the sheet_name argument to tell pandas which sheet to read. By default, read_excel will load all worksheets. Here is how you can use the sheet_name argument:

df = pd.read_excel('SuperStoreUS-2015.xlsx', sheet_name="Returns")

print(df)

This will read the Returns sheet, and here is an example output:

      Order ID    Status

0           65  Returned

1          612  Returned

2          614  Returned

3          678  Returned

4          710  Returned

...        ...       ...

1629    182681  Returned

1630    182683  Returned

1631    182750  Returned

1632    182781  Returned

1633    182906  Returned

[1634 rows x 2 columns]

The sheet_name argument also takes integers that are used in zero-indexed sheet positions. For instance, using pd.read_excel('SuperStoreUS-2015.xlsx', sheet_name=1) will load the Returns sheet as well.

You can also choose to read specific columns from the Excel file. The read_excel function allows for selective column reading using the usecols parameter. It accepts various formats:

  • A string for Excel column letters or ranges (e.g., "A:C").
  • A list of integers for column positions.
  • A list of column names.

Here is an example using column names:

import pandas as pd

df = pd.read_excel('SuperStoreUS-2015.xlsx', usecols=['Row ID', 'Sales'])

print(df)

In this case, the usecols parameter specifies that only columns Row ID and Sales from the Excel file should be imported into the DataFrame. The code below does the same thing, but using Excel column letters:

import pandas as pd

df = pd.read_excel('SuperStoreUS-2015.xlsx', usecols='A,X')

print(df)

Here is the output:

      Row ID    Sales

0      20847    13.01

1      20228  6362.85

2      21776   211.15

3      24844  1164.45

4      24846    22.23

...      ...      ...

1947   19842   207.31

1948   19843   143.12

1949   26208    59.98

1950   24911   135.78

1951   25914   506.50

You can also use range selection to read columns by their position. In the code below, we are reading from Order Priority to Customer ID.

df = pd.read_excel('SuperStoreUS-2015.xlsx', usecols='B:F')

Here is an example output when reading columns B to F:

     Order Priority  Discount  Unit Price  Shipping Cost  Customer ID

0              High      0.01        2.84           0.93            3

1     Not Specified      0.02      500.98          26.00            5

2          Critical      0.06        9.48           7.29           11

3            Medium      0.09       78.69          19.99           14

4            Medium      0.08        3.28           2.31           14

Additionally, you can provide a callable that evaluates column names, reading only those for which the function returns True.

Handling Missing Data in Excel Files

In Excel files, missing data refers to values that are absent, often represented by empty cells. When reading an Excel file into a pandas DataFrame, missing data is automatically identified and handled as NaN (Not a Number), which is pandas placeholder for missing values.

Pandas offers several methods to handle missing data, such as:

  • dropna(): Removes rows or columns with missing values.
  • fillna(): Replaces missing values with a specified value (e.g., 0 or the mean of the column).
  • isna(): Detects missing values and returns a boolean DataFrame.

For example, using fillna on our Excel file will replace all missing values with 0:

df = pd.read_excel('SuperStoreUS-2015.xlsx')

df_cleaned = df.fillna(0)

Handling missing data is essential to ensure accurate analysis and prevent errors or biases in data-driven decisions.

Reading and Analyzing an Excel File in Pandas

Let’s make a pragmatic use of the notion we have learned. In this practical example, we will walk through reading an Excel file, performing some basic analysis, and exporting the manipulated data into various formats. 

Specifically, we’ll calculate the sum, maximum, and minimum values for the Profit column for the year 2015, and export the results to CSV, JSON, and a Python dictionary.

Step 1: Loading the Excel File

The first step is to load the Excel file using the read_excel function from pandas:

import pandas as pd

df = pd.read_excel('SuperStoreUS-2015.xlsx', usecols=['Ship Date', 'Profit'])

print(df.head())

This code reads the SuperStoreUS-2015.xlsx file into a pandas DataFrame and displays the first few rows, including the Ship Date and Profit columns.

Step 2: Calculating Profit for June 2015

Next, we will filter the data to include only records from June 2015 and calculate the total, maximum, and minimum profit for that month. Since the date format in the dataset is MM/DD/YYYY, we will convert the Ship Date column to a datetime format and filter by the specific month:

df['Ship Date'] = pd.to_datetime(df['Ship Date'], format='%m/%d/%Y')

df_june_2015 = df[(df['Ship Date'].dt.year == 2015) & (df['Ship Date'].dt.month == 6)]

# Calculate the sum, max, and min for the Profit column

profit_sum = df_june_2015['Profit'].sum()

profit_max = df_june_2015['Profit'].max()

profit_min = df_june_2015['Profit'].min()

print(f"Total Profit in June 2015: {profit_sum}")

print(f"Maximum Profit in June 2015: {profit_max}")

print(f"Minimum Profit in June 2015: {profit_min}")

The output will be something like:

print(f"Total Profit in June 2015: {round(profit_sum, ndigits=2)}")

print(f"Maximum Profit in June 2015: {round(profit_max, ndigits=2)}")

print(f"Minimum Profit in June 2015: {round(profit_min, ndigits=2)}")

Step 3: Exporting the Manipulated Data

Once the profit for June 2015 has been calculated, we can export the filtered data to different formats, including CSV, JSON, and a Python dictionary.

# Export to CSV

df_june_2015.to_csv('SuperStoreUS_June2015_Profit.csv', index=False)

# Export to JSON

df_june_2015.to_json('SuperStoreUS_June2015_Profit.json', orient='records')

# Convert to Dictionary

data_dict = df_june_2015.to_dict(orient='records')

print(data_dict[:5])

In this step, the data is first exported to a CSV file and then to a JSON file. Finally, the DataFrame is converted into a Python dictionary, with each row represented as a dictionary.

Conclusion

In this article, we have learned how to use the read_excel function from pandas to read and manipulate Excel files. This is a powerful function with the ability to simplify data filtering for a better focus on the rows or columns we want.

Check out our app platform to find Python applications, such as Celery, Django, FastAPI and Flask. 

Python
03.10.2024
Reading time: 8 min

Similar

Python

The Walrus Operator in Python

The first question newcomers often ask about the walrus operator in Python is: why such a strange name? The answer lies in its appearance. Look at the Python walrus operator: :=. Doesn't it resemble a walrus lounging on a beach, with the symbols representing its "eyes" and "tusks"? That's how it earned the name. How the Walrus Operator Works Introduced in Python 3.8, the walrus operator allows you to assign a value to a variable while returning that value in a single expression. Here's a simple example: print(apples = 7) This would result in an error because print expects an expression, not an assignment. But with the walrus operator: print(apples := 7) The output will be 7. This one-liner assigns the value 7 to apples and returns it simultaneously, making the code compact and clear. Practical Examples Let’s look at a few examples of how to use the walrus operator in Python. Consider a program where users input phrases. The program stops if the user presses Enter. In earlier versions of Python, you'd write it like this: expression = input('Enter something or just press Enter: ') while expression != '': print('Great!') expression = input('Enter something or just press Enter: ') print('Bored? Okay, goodbye.') This works, but we can simplify it using the walrus operator, reducing the code from five lines to three: while (expression := input('Enter something or just press Enter: ')) != '': print('Great!') print('Bored? Okay, goodbye.') Here, the walrus operator allows us to assign the user input to expression directly inside the while loop, eliminating redundancy. Key Features of the Walrus Operator: The walrus operator only assigns values within other expressions, such as loops or conditions. It helps reduce code length while maintaining clarity, making your scripts more efficient and easier to read. Now let's look at another example of the walrus operator within a conditional expression, demonstrating its versatility in Python's modern syntax. Using the Walrus Operator with Conditional Constructs Let’s write a phrase, assign it to a variable, and then find a word in this phrase using a condition: phrase = 'But all sorts of things and weather must be taken in together to make up a year and a sphere...' word = phrase.find('things') if word != -1: print(phrase[word:]) The expression [word:] allows us to get the following output: things and weather must be taken in together to make up a year and a sphere... Now let's shorten the code using the walrus operator. Instead of: word = phrase.find('things') if word != -1: print(phrase[word:]) we can write: if (word := phrase.find('things')) != -1: print(phrase[word:]) In this case, we saved a little in volume but also reduced the number of lines. Note that, despite the reduced time for writing the code, the walrus operator doesn’t always simplify reading it. However, in many cases, it’s just a matter of habit, so with practice, you'll learn to read code with "walruses" easily. Using the Walrus Operator with Numeric Expressions Lastly, let’s look at an example from another area where using the walrus operator helps optimize program performance: numerical operations. We will write a simple program to perform exponentiation: def pow(number, power): print('Calling pow') result = 1 while power: result *= number power -= 1 return result Now, let’s enter the following in the interpreter: >>> [pow(number, 2) for number in range(3) if pow(number, 2) % 2 == 0] We get the following output: Calling pow Calling pow Calling pow Calling pow Calling pow [0, 4, 16] Now, let's rewrite the input in the interpreter using the walrus operator: >>> [p for number in range(3) if (p := pow(number, 2)) % 2 == 0] Output: Calling pow Calling pow Calling pow [0, 4, 16] As we can see, the code hasn’t shrunk significantly, but the number of function calls has nearly been halved, meaning the program will run faster! Conclusion In conclusion, the walrus operator (:=) introduced in Python 3.8 streamlines code by allowing assignment and value retrieval in a single expression. This operator enhances readability and efficiency, particularly in loops and conditional statements. Through practical examples, we’ve seen how it reduces line counts and minimizes redundant function calls, leading to faster execution. With practice, developers can master the walrus operator, making their code cleaner and more concise. On our app platform you can deploy Python applications, such as Celery, Django, FastAPI and Flask. 
23 October 2024 · 4 min to read
Python

Python String Functions

As the name suggests, Python 3 string functions are designed to perform various operations on strings. There are several dozen string functions in the Python programming language. In this article, we will cover the most commonly used ones and several special functions that may be less popular but are still useful. They can be helpful not only for formatting but also for data validation. List of Basic String Functions for Text Formatting First, let’s discuss string formatting functions, and to make the learning process more enjoyable, we will use texts generated by a neural network in our examples. capitalize() — Converts the first character of the string to uppercase, while all other characters will be in lowercase: >>> phrase = 'the shortage of programmers increases the significance of DevOps. After the presentation, developers start offering their services one after another, competing with each other for DevOps.' >>> phrase.capitalize() 'The shortage of programmers increases the significance of devops. after the presentation, developers start offering their services one after another, competing with each other for devops.' casefold() — Returns all elements of the string in lowercase: >>> phrase = 'Cloud providers offer scalable computing resources and services over the internet, enabling businesses to innovate quickly. They support various applications, from storage to machine learning, while ensuring reliability and security.' >>> phrase.casefold() 'cloud providers offer scalable computing resources and services over the internet, enabling businesses to innovate quickly. they support various applications, from storage to machine learning, while ensuring reliability and security.' center() — This method allows you to center-align strings: >>> text = 'Python is great for writing AI' >>> newtext = text.center(40, '*') >>> print(newtext) *****Python is great for writing AI***** A small explanation: The center() function has two arguments: the first (length of the string for centering) is mandatory, while the second (filler) is optional. In the operation above, we used both. Our string consists of 30 characters, so the remaining 10 were filled with asterisks. If the second attribute were omitted, spaces would fill the gaps instead. upper() and lower() — convert all characters to uppercase and lowercase, respectively: >>> text = 'Projects using Internet of Things technology are becoming increasingly popular in Europe.' >>> text.lower() 'projects using internet of things technology are becoming increasingly popular in europe.' >>> text.upper() 'PROJECTS USING INTERNET OF THINGS TECHNOLOGY ARE BECOMING INCREASINGLY POPULAR IN EUROPE.' replace() — is used to replace a part of the string with another element: >>> text.replace('Europe', 'USA') 'Projects using Internet of Things technology are becoming increasingly popular in the USA.' The replace() function also has an optional count attribute that specifies the maximum number of replacements if the element to be replaced occurs multiple times in the text. It is specified in the third position: >>> text = 'hooray hooray hooray' >>> text.replace('hooray', 'hip', 2) 'hip hip hooray' strip() — removes identical characters from the edges of a string: >>> text = 'ole ole ole' >>> text.strip('ole') 'ole' If there are no symmetrical values, it will remove what is found on the left or right. If the specified characters are absent, the output will remain unchanged: >>> text.strip('ol') 'e ole ole' >>> text.strip('le') 'ole ole o' >>> text.strip('ura') 'ole ole ole' title() — creates titles, capitalizing each word: >>> texttitle = 'The 5G revolution: transforming connectivity. How next-gen networks are shaping our digital future' >>> texttitle.title() 'The 5G Revolution: Transforming Connectivity. How Next-Gen Networks Are Shaping Our Digital Future' expandtabs() — changes tabs in the text, which helps with formatting: >>> clublist = 'Milan\tReal\tBayern\tArsenal' >>> print(clublist) Milan Real Bayern Arsenal >>> clublist.expandtabs(1) 'Milan Real Bayern Arsenal' >>> clublist.expandtabs(5) 'Milan Real Bayern Arsenal' String Functions for Value Checking Sometimes, it is necessary to count a certain number of elements in a sequence or check if a specific value appears in the text. The following string functions solve these and other tasks. count() — counts substrings (individual elements) that occur in a string. Let's refer again to our neural network example: >>> text = "Cloud technologies significantly accelerate work with neural networks and AI. These technologies are especially important for employees of large corporations operating in any field — from piloting spacecraft to training programmers." >>> element = "o" >>> number = text.count(element) >>> print("The letter 'o' appears in the text", number, "time(s).") The letter 'o' appears in the text 19 time(s). As a substring, you can specify a sequence of characters (we'll use text from the example above): >>> element = "ob" >>> number = text.count(element) >>> print("The combination 'ob' appears in the text", number, "time(s).") The combination 'in' appears in the text 5 time(s). Additionally, the count() function has two optional numerical attributes that specify the search boundaries for the specified element: >>> element = "o" >>> number = text.count(element, 20, 80) >>> print("The letter 'o' appears in the specified text fragment", number, "time(s).") The letter 'o' appears in the specified text fragment 6 time(s). The letter 'o' appears in the specified text fragment 6 time(s). find() — searches for the specified value in the string and returns the smallest index. Again, we will use the example above: >>> print(text.find(element)) 7 This output means that the first found letter o is located at position 7 in the string (actually at position 8, because counting in Python starts from zero). Note that the interpreter ignored the capital letter O, which is located at position zero. Now let's combine the two functions we've learned in one code: >>> text = "Cloud technologies significantly accelerate work with neural networks and AI. These technologies are especially important for employees of large corporations operating in any field — from piloting spacecraft to training programmers." >>> element = "o" >>> number = text.count(element, 20, 80) >>> print("The letter 'o' appears in the specified text fragment", number, "time(s), and the first time in the whole text at", (text.find(element)), "position.") The letter 'o' appears in the specified text fragment 3 time(s), and the first time in the whole text at 7 position. index() — works similarly to find(), but will raise an error if the specified value is absent: Traceback (most recent call last): File "C:\Python\text.py", line 4, in <module> print(text.index(element)) ValueError: substring not found Here's what the interpreter would return when using the find() function in this case: -1 This negative position indicates that the value was not found. enumerate() — a very useful function that not only iterates through the elements of a list or tuple, returning their values, but also returns the ordinal number of each element: team_scores = [78, 74, 56, 53, 49, 47, 44] for number, score in enumerate(team_scores, 1): print(str(number) + '-th team scored ' + str(score) + ' points.') To output the values with their ordinal numbers, we introduced a few variables: number for ordinal numbers, score for the values of the list, and str indicates a string. And here’s the output: 1-th team scored 78 points. 2-th team scored 74 points. 3-th team scored 56 points. 4-th team scored 53 points. 5-th team scored 49 points. 6-th team scored 47 points. 7-th team scored 44 points. Note that the second attribute of the enumerate() function is the number 1, otherwise Python would start counting from zero. len() — counts the length of an object, i.e., the number of elements that make up a particular sequence: >>> len(team_scores) 7 This way, we counted the number of elements in the list from the example above. Now let's ask the neural network to write a string again and count the number of characters in it: >>> network = 'It is said that artificial intelligence excludes the human factor. But do not forget that the human factor is still present in the media and government structures.' >>> len(network) 162 Special String Functions in Python join() — allows you to convert lists into strings: >>> cities = ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'Philadelphia', 'San Antonio'] >>> cities_str = ', '.join(cities) >>> print('Cities in one line:', cities_str) Cities in one line: New York, Los Angeles, Chicago, Houston, Phoenix, Philadelphia, San Antonio print() — provides a printed representation of any object in Python: >>> cities = ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'Philadelphia', 'San Antonio'] >>> print(cities) ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'Philadelphia', 'San Antonio'] type() — returns the type of the object: >>> type(cities) <class 'list'> We found out that the object from the previous example is a list. This is useful for beginners, as they may initially confuse lists with tuples, which have different functionalities and are handled differently by the interpreter. map() — is a fairly efficient replacement for a for loop, allowing you to iterate over the elements of an iterable object, applying a built-in function to each of them. For example, let's convert a list of string values into integers using the int function: >>> numbers_list = ['4', '7', '11', '12', '17'] >>> list(map(int, numbers_list)) [4, 7, 11, 12, 17] As we can see, we used the list() function, "wrapping" the map() function in it—this was necessary to avoid the following output: >>> numbers_list = ['4', '7', '11', '12', '17'] >>> map(int, numbers_list) <map object at 0x0000000002E272B0> This is not an error; it simply produces the ID of the object, and the program will continue to run. However, the list() method is useful in such cases to get the desired list output. Of course, we haven't covered all string functions in Python. Still, this set will already help you perform a large number of operations with strings and carry out various transformations (programmatic and mathematical). On our app platform you can deploy Python applications, such as Celery, Django, FastAPI and Flask. 
23 October 2024 · 9 min to read
Python

Deploying Python Applications with Gunicorn

In this article, we’ll show how to set up an Ubuntu 20.04 server and install and configure the components required for deploying Python applications. We’ll configure the WSGI server Gunicorn to interact with our application. Gunicorn will serve as an interface that converts client requests via the HTTP protocol into Python function calls executed by the application. Then, we will configure Nginx as a reverse proxy server for Gunicorn, which will forward requests to the Gunicorn server. Additionally, we will cover securing HTTP connections with an SSL certificate or using other features like load balancing, caching, etc. These details can be helpful when working with cloud services like those provided by Hostman. Creating a Python Virtual Environment To begin, we need to update all packages: sudo apt update Ubuntu provides the latest version of the Python interpreter by default. Let’s check the installed version using the following command: python3 --version Example output: Python 3.10.12 We’ll set up a virtual environment to ensure that our project has its own dependencies, separate from other projects. First, install the virtualenv package, which allows you to create virtual environments: sudo apt-get install python3-venv python3-dev Next, create a folder for your project and navigate into it: mkdir myappcd myapp Now, create a virtual environment: python3 -m venv venv And create a folder for your project: mkdir app Your project directory should now contain two items: app and venv. You can verify this using the standard Linux command to list directory contents: ls Expected output: myapp venv You need to activate the virtual environment so that all subsequent components are installed locally for the project: source venv/bin/activate Installing and Configuring Gunicorn Gunicorn (Green Unicorn) is a Python WSGI HTTP server for UNIX. It is compatible with various web frameworks, fast, easy to implement, and uses minimal server resources. To install Gunicorn, run the following command: pip install gunicorn WSGI and Python WSGI (Web Server Gateway Interface) is the standard interface between a Python application running on the server side and the web server itself, such as Nginx. A WSGI server interacts with the application, allowing you to run code when handling requests. Typically, the application is provided as an object named application in a Python module, which is made available to the server. In the standard wsgi.py file, there is usually a callable application. For example, let’s create such a file using the nano text editor: nano wsgi.py Add the following simple code to the file: from aiohttp import web async def index(request): return web.Response(text="Welcome home!") app = web.Application() app.router.add_get('/', index) In the code above, we import aiohttp, a library that provides an asynchronous HTTP client built on top of asyncio. HTTP requests are a classic example of where asynchronous handling is ideal, as they involve waiting for server responses, during which other code can execute efficiently. This library allows sequential requests to be made without waiting for the first response before sending a new one. It’s common to run aiohttp servers behind Nginx. Running the Gunicorn Server You can launch the server using the following command template: gunicorn [OPTIONS] [WSGI_APP] Here, [WSGI_APP] consists of $(MODULE_NAME):$(VARIABLE_NAME) and [OPTIONS] is a set of parameters for configuring Gunicorn. A simple command would look like this: gunicorn wsgi:app To restart Gunicorn, you can use: sudo systemctl restart gunicorn Systemd Integration systemd is a system and service manager that allows for strict control over processes, resources, and permissions. We’ll create a socket that systemd will listen to, automatically starting Gunicorn in response to traffic. Configuring the Gunicorn Service and Socket First, create the service configuration file: sudo nano /etc/systemd/system/gunicorn.service Add the following content to the file: [Unit] Description=gunicorn daemon Requires=gunicorn.socket After=network.target [Service] Type=notify User=someuser Group=someuser RuntimeDirectory=gunicorn WorkingDirectory=/home/someuser/myapp ExecStart=/path/to/venv/bin/gunicorn wsgi:app ExecReload=/bin/kill -s HUP $MAINPID KillMode=mixed TimeoutStopSec=5 PrivateTmp=true [Install] WantedBy=multi-user.target Make sure to replace /path/to/venv/bin/gunicorn with the actual path to the Gunicorn executable within your virtual environment. It will likely look something like this: /home/someuser/myapp/venv/bin/gunicorn. Next, create the socket configuration file: sudo nano /etc/systemd/system/gunicorn.socket Add the following content: [Unit] Description=gunicorn socket [Socket] ListenStream=/run/gunicorn.sock SocketUser=www-data [Install] WantedBy=sockets.target Enable and start the socket with: systemctl enable --now gunicorn.socket Configuring Gunicorn Let's review some useful parameters for Gunicorn in Python 3. You can find all possible parameters in the official documentation. Sockets -b BIND, --bind=BIND — Specifies the server socket. You can use formats like: $(HOST), $(HOST):$(PORT). Example: gunicorn --bind=127.0.0.1:8080 wsgi:app This command will run your application locally on port 8080. Worker Processes -w WORKERS, --workers=WORKERS — Sets the number of worker processes. Typically, this number should be between 2 to 4 per server core. Example: gunicorn --workers=2 wsgi:app Process Type -k WORKERCLASS, --worker-class=WORKERCLASS — Specifies the type of worker process to run. By default, Gunicorn uses the sync worker type, which is a simple synchronous worker that handles one request at a time. Other worker types may require additional dependencies. Asynchronous worker processes are available using Greenlets (via Eventlet or Gevent). Greenlets are a cooperative multitasking implementation for Python. The corresponding parameters are eventlet and gevent. We will use an asynchronous worker type compatible with aiohttp: gunicorn wsgi:app --bind localhost:8080 --worker-class aiohttp.GunicornWebWorker Access Logging You can enable access logging using the --access-logfile flag. Example: gunicorn wsgi:app --access-logfile access.log Error Logging To specify an error log file, use the following command: gunicorn wsgi:app --error-logfile error.log You can also set the verbosity level of the error log output using the --log-level flag. Available log levels in Gunicorn are: debug info warning error critical By default, the info level is set, which omits debug-level information. Installing and Configuring Nginx First, install Nginx with the command: sudo apt install nginx Let’s check if the Nginx service can connect to the socket created earlier: sudo -u www-data curl --unix-socket /run/gunicorn.sock http If successful, Gunicorn will automatically start, and you'll see the HTML code from the server in the terminal. Nginx configuration involves adding config files for virtual hosts. Each proxy configuration should be stored in the /etc/nginx/sites-available directory. To enable each proxy server, create a symbolic link to it in /etc/nginx/sites-enabled. When Nginx starts, it automatically loads all proxy servers in this directory. Create a new configuration file: sudo nano /etc/nginx/sites-available/myconfig.conf Then create a symbolic link with the command: sudo ln -s /etc/nginx/sites-available/myconfig.conf /etc/nginx/sites-enabled Nginx must be restarted after any changes to the configuration file to apply the new settings. First, check the syntax of the configuration file: nginx -t Then reload Nginx: nginx -s reload Conclusion Gunicorn is a robust and versatile WSGI server for deploying Python applications, offering flexibility with various worker types and integration options like Nginx for load balancing and reverse proxying. Its ease of installation and configuration, combined with detailed logging and scaling options, make it an excellent choice for production environments. By utilizing Gunicorn with frameworks like aiohttp and integrating it with Nginx, you can efficiently serve Python applications with improved performance and resource management.
23 October 2024 · 7 min to read

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support