Sign In
Sign In

Time Series Forecasting with ARIMA in Python 3

Time Series Forecasting with ARIMA in Python 3
Hostman Team
Technical writer
Python
15.07.2024
Reading time: 23 min

Data analytics has long become an integral part of modern life. We encounter a massive flow of information daily, which we need to collect and interpret correctly. One method of data analysis is time series forecasting. A time series (dynamic series) is a sequence of data points or observations taken at regular intervals. Examples of time series include monthly sales, daily temperatures, annual profits, and so on. Time series forecasting is a scientific field where models are built to predict the future behavior of a process or phenomenon based on past observations recorded in the dynamic series.

In this guide, we will focus on using the ARIMA model, one of the most commonly applied approaches in time series analysis. We will thoroughly examine the process of using the ARIMA model in Python 3—from the initial stages of loading and processing data to the final stage of forecasting. We will also learn how to determine and interpret the parameters of the ARIMA model and how to evaluate its quality.

Whether you are new to data analysis or an experienced analyst, this guide aims to teach you how to apply the ARIMA model for time series data forecasting. But not just apply it but do so effectively and in an automated manner, using the extensive functionality of Python.

Setting Up the Working Environment for Data Analysis in Python

Installing Python

First and foremost, you need to install Python itself—the programming language we will use for data analysis. You can download it from the official website, python.org, following the installation instructions provided there. After completing the installation, open the command line (on Windows) or terminal (on macOS/Linux) and enter:

python --version

If everything was done correctly, you will see the version number of the installed Python.

Setting Up the Development Environment

To work with Python, you can choose a development environment (IDE) that suits you. In this guide, we will work with Jupyter Notebook, which is very popular among data analysts. Other popular options include PyCharm, Visual Studio Code, and Spyder. To install Jupyter Notebook, enter the following in the command line:

pip install jupyter

Installing Necessary Python Packages

Along with Python, some basic libraries are always installed. These are extremely useful tools, but you may need additional tools for more in-depth data analysis. In this guide, we will use:

  • pandas (for working with tabular data)

  • numpy (for scientific computations)

  • matplotlib (for data visualization).stats models

  • statsmodels (a library for statistical models).

You can install these libraries using the pip3 install command in the terminal or command line:

pip3 install pandas numpy matplotlib statsmodels

We will also need the libraries warnings (for generating warnings) and itertools (for creating efficient looping structures), which are already included in the standard Python library, so you do not need to install them separately. To check the installed packages, use the command:

pip list

As a result, you will get a list of all installed modules and their versions.

Creating a Working Directory

Your working directory is the place on your computer where you will store all your Python scripts and project files. To create a new directory, open the terminal or command line and enter the following commands:

cd path_to_your_directory
mkdir Your_Project_Name
cd Your_Project_Name

Here, path_to_your_directory is the path to the location where the project folder will be created, and Your_Project_Name is the name of your project.

After successfully completing the above steps, you are ready to work on data analysis in Python. Your development environment is set up, your working directory is ready, and all necessary packages are installed.

Loading and Processing Data

Starting Jupyter Notebook

Let's start by launching Jupyter Notebook, our main tool for writing and testing Python code. In the command line (or terminal), navigate to your working directory and enter the following command:

jupyter notebook

A new tab with the Jupyter Notebook interface will open in your browser. To create a new document, select the "New" tab in the top right corner of the window and choose "Python 3" from the dropdown menu. You will be automatically redirected to a new tab where your notebook will be created.

Importing Libraries

The next step is to import the necessary Python libraries. Create a new cell in your notebook and insert the following code:

import warnings
import itertools
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm

To run the code, use Shift+Enter. Now these libraries are available in your project, and you can use their functionalities to perform various data analysis tasks. Pressing Shift+Enter will execute the current code cell and move the focus to the next cell, where you can continue writing your code.

Loading Data

For time series forecasting in Python, we will use the Airline Passengers dataset. This dataset represents the monthly tracking of the number of passengers on international airlines, expressed in thousands, from 1949 to 1960. You can find this data here. To load data from a CSV file via URL, use the pandas library:

url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv"
time_series = pd.read_csv(url)

If your CSV file is stored locally on your computer, use the following line of code to load it:

time_series = pd.read_csv('airline_passengers.csv')

Now the data is saved in a DataFrame named time_series. A DataFrame is the primary data structure in the pandas library. It is a two-dimensional table where each row is a separate observation, and the columns are various features or variables of that observation. To verify the correctness of the data loading and to affirm the data format, you can display the first few rows of our dataset:

print(time_series.head())

This code will output the first five rows of the loaded dataset, allowing you to quickly check if they were loaded correctly and if they look as expected. By default, the head() method outputs five rows, but you can specify a different number in the method's parentheses to view another quantity.

You can also view the last rows of the DataFrame:

print(time_series.tail())

Data Processing

Before starting the analysis, the data needs to be preprocessed. Generally, this can involve many steps, but for our time series, we can limit ourselves to the following actions.

Checking for Missing Values

Handling missing values is an important step in the preprocessing of dynamic series. Missing values can cause issues in the analysis and distort forecasting results. To check for missing values, you can use the isnull() method from the pandas library:

print(time_series.isnull().sum())

If 0 is indicated for all columns, this means there are no missing values in the data. However, if missing values are found during execution, they should be handled. There are various ways to handle missing values, and the approach will depend on the nature of your data. For example, we can fill in missing values with the column's mean value:

time_series = time_series.fillna(time_series.mean())

To replace missing values only in certain columns, use this command:

time_series['Column 1'] = time_series['Column 1'].fillna(time_series['Column 1'].mean())

Data Type Conversion 

Each column in the DataFrame has a specific data type. In dynamic series, the DataTime type, specifically designed for storing dates and times, is particularly important. Our DataFrame in pandas reads the information as text by default. Even if the column contains dates, pandas will perceive them as regular strings. In our case, we need to convert the Month column to DataTime so that we can work with temporal data:

time_series['Month'] = pd.to_datetime(time_series['Month'])

Setting DataTime as an Index

In pandas, each data row has its unique index (similar to a row number). However, sometimes it is more convenient to use a specific column from your data as an index. When working with time series, the most convenient choice for the index is the column containing the date or time. This allows for easy selection and analysis of data for specific time periods. In our case, we use the Month column as the index:

time_series.set_index('Month', inplace=True)

Rescaling Data

Another important step in data preprocessing is checking the need for rescaling the data. If the range of your data is too large (e.g., the Passengers value ranges from thousands to millions), you may need to transform the data. In the case of airline passenger data, they look quite organized, and such rescaling may not be relevant. However, it is always important to check the data range before further steps. An example of data standardization when the range is large:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
time_series[['Passengers']] = scaler.fit_transform(time_series[['Passengers']])

All the aforementioned steps are important in preparing data for further analysis. They help improve the quality of the time series and make the process of working with it simpler and more efficient. In this guide, we have covered only some data processing steps. But this stage can also include other actions, such as detecting and handling anomalies or outliers, creating new variables or features, and dividing the data into subgroups or categories.

Data Visualization

An important element when working with data is its visual representation. Using matplotlib, we can easily turn data into a visual chart, which helps us understand the structure of the time sequence. Visualization allows us to immediately see trends and seasonality in the data. A trend is the general direction of data movement over a long period. Seasonality is recurring data fluctuations in predictable time frames (week, month, quarter, year, etc.). Generally, a trend is associated with long-term data movement, while seasonality is associated with short-term, regular, and periodic changes.

For example, if you see that the number of passengers grows every year, this indicates an upward trend. If the number of passengers grows in the same months every year, this indicates annual seasonality.

To draw a chart, use the following lines of code:

plt.figure(figsize=(15,8))
plt.plot(time_series['Passengers'])
plt.title('Time Series Plot')
plt.xlabel('Date')
plt.ylabel('Passengers')
plt.show()

For our data, we get the following plot:

Group 1321314183 (1)

Stationarity of Data

In time series analysis, it is crucial to pay attention to the concept of stationarity. Stationarity in a time series means that the series' characteristics (such as mean and variance) remain constant over time. Non-stationary series often lead to errors in further predictions.

The ARIMA model can adapt the series to stationarity "on its own" through a special model parameter (d). However, understanding whether your initial time series is stationary helps you better understand ARIMA's workings.

There are several methods to check for stationarity in a series:

  1. Visual Analysis: Start by plotting the data and observe the following aspects:

    • Mean: Fluctuations in this indicator over time may signal that the time series is not stationary.

    • Variance: If the variance changes over time, this also indicates non-stationarity.

    • Trend: A visible trend on the graph is another indication of non-stationarity.

    • Seasonality: Seasonal fluctuations on the graph can also suggest non-stationarity.

  2. Statistical Analysis: Perform statistical tests like the Dickey-Fuller test. This method provides a quantitative assessment of a time series' stationarity. The null hypothesis of the test assumes that the time series is non-stationary. If the p-value is less than the significance level of 0.05, the null hypothesis is rejected, and the series can be considered stationary.

Running the Dickey-Fuller test on our data might look like this:

from statsmodels.tsa.stattools import adfuller

print('Test result:')
df_result = adfuller(time_series['Passengers'])
df_labels = ['ADF Test Statistic', 'p-value', '#Lags Used', 'Number of Observations Used']
for result_value, label in zip(df_result, df_labels):
    print(label + ' : ' + str(result_value))

if df_result[1] <= 0.05:
    print("Strong evidence against the null hypothesis, the series is stationary.")
else:
    print("Weak evidence against the null hypothesis, the series is not stationary.")

Our time series is not stationary, but in the following sections, we will automatically search for parameters for the ARIMA model and find the necessary parameters to make the series stationary.

Test result:

Test result:
ADF Test Statistic : 0.8153688792060482
p-value : 0.991880243437641
#Lags Used : 13
Number of Observations Used : 130
Weak evidence against the null hypothesis, the series is not stationary.

Even though we don't need to manually make the series stationary, it's useful to know which methods can be used to do so. There are many methods, including:

  • Differencing: One of the most common methods, differencing involves calculating the difference between consecutive observations in the time series.

  • Seasonal Differencing: A variation of regular differencing, applied to data with a seasonal component.

  • Log Transformation: Taking the logarithm of the data can help reduce variability in the series and make it more stationary.

Some time series may be particularly complex and require combining transformation methods. After transforming the series, you should recheck for stationarity using the Dickey-Fuller test to ensure the transformation was successful.

ARIMA Model

ARIMA (AutoRegressive Integrated Moving Average) is a statistical model used for analyzing and forecasting time series data.

  • AutoRegressive (AR): Uses the dependency between an observation and a number of lagged observations (e.g., predicting tomorrow's weather based on previous days' weather).

  • Integrated (I): Involves differencing the time series data to make it stationary.

  • Moving Average (MA): Models the error between the actual observation and the predicted value using a combination of past errors.

The ARIMA model is usually denoted as ARIMA(p, d, q), where p, d, and q are the model parameters:

  • p: The order of the autoregressive part (number of lagged observations included).

  • d: The degree of differencing (number of times the data is differenced to achieve stationarity).

  • q: The order of the moving average part (number of lagged forecast errors included).

Choosing the appropriate values for (p, d, q) involves analyzing autocorrelation and partial autocorrelation plots and applying information criteria.

Seasonal ARIMA Model

Seasonal ARIMA (SARIMA) extends ARIMA to account for seasonality in time series data. In many cases, time series exhibit clear seasonal patterns, such as ice cream sales increasing in summer and decreasing in winter. SARIMA captures these seasonal patterns.

SARIMA is typically denoted as SARIMA(p, d, q)(P, D, Q)m, where p, d, q are non-seasonal parameters, and P, D, Q are seasonal parameters:

  • p, d, q: The same as in ARIMA.

  • P: The order of seasonal autoregression (number of lagged seasons affecting the current season).

  • D: The degree of seasonal differencing (number of times seasonal trends are differenced).

  • Q: The order of seasonal moving average (number of lagged seasonal forecast errors included).

  • m: The length of the seasonal period (e.g., 12 for monthly data with yearly seasonality).

Like ARIMA, SARIMA is suitable for forecasting time series data but with the added capability of capturing and modeling seasonal patterns.

Although ARIMA, particularly seasonal ARIMA, may seem complex due to the need to carefully select numerous parameters, automating this process can simplify the task.

Defining Model Parameters

The first step in configuring an ARIMA model is determining the optimal parameter values for our specific dataset.

To tune the ARIMA parameters, we will use "grid search." The essence of this method is that it goes through all possible parameter combinations from a predefined grid of values and trains the model on each combination. After training the model on each combination, the model with the best performance is selected.

The more different parameter values, the more combinations need to be checked, and the longer the process will take. For our case, we will use only two possible values (0 and 1) for each parameter, resulting in a total of 8 combinations for the ARIMA parameters and 8 for the seasonal part (with a seasonal period length = 12). Thus, the total number of combinations to check is 64, leading to a relatively quick execution.

It's important to remember that the goal is to find a balance between the time spent on the grid search and the quality of the final model, meaning finding parameter values that yield the highest quality while minimizing time costs.

Importing Necessary Packages

from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import acf, pacf
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

Statsmodels provides us with methods for building ARIMA models, and itertools (which we imported earlier) is used to create combinations of possible parameter values.

Ignoring Warnings

When working with large datasets and complex computations like statistical analysis or machine learning, libraries and functions may generate warnings about potential issues or non-optimality. However, these warnings are often insignificant or irrelevant to your specific case. Therefore, we set the warnings filter to ignore:

warnings.filterwarnings("ignore")

Creating a Range of Parameters for Model Tuning

To determine the model parameters, we'll define the function search_optimal_sarima.

def search_optimal_sarima(time_series, seasonal_cycle):
    order_vals = diff_vals = ma_vals = range(0, 2)
    pdq_combinations = list(itertools.product(order_vals, diff_vals, ma_vals))
    seasonal_combinations = [(combo[0], combo[1], combo[2], seasonal_cycle) for combo in pdq_combinations]
       
    smallest_aic = float("inf")
    optimal_order_param = optimal_seasonal_param = None

    for order_param in pdq_combinations:
        for seasonal_param in seasonal_combinations:
            try:
                sarima_model = sm.tsa.statespace.SARIMAX(time_series,
                                                         order=order_param,
                                                         seasonal_order=seasonal_param,
                                                         enforce_stationarity=False,
                                                         enforce_invertibility=False)

                model_results = sarima_model.fit()
                if model_results.aic < smallest_aic:
                    smallest_aic = model_results.aic
                    optimal_order_param = order_param
                    optimal_seasonal_param = seasonal_param
            except:
                continue

    print('ARIMA{}x{} - AIC:{}'.format(optimal_order_param, optimal_seasonal_param, smallest_aic))

seasonal_cycle_length = 12
search_optimal_sarima(time_series, seasonal_cycle_length)

The first three lines of code in our function create the parameter ranges. As we already know, the ARIMA model has three main parameters, p, d, q. In the code above, p, d, and q are ranges from 0 to 2, meaning they can take values 0 or 1. The itertools.product() function generates all possible combinations of these three parameters. Examples of combinations include (0, 0, 0), (0, 0, 1), (0, 1, 1), and so on.

Then we create additional combinations by adding the seasonal period to each of the pdq combinations. This allows the model to account for seasonal influences on the time series.

Finding the Best Parameters for the Model

Now we need to apply the parameters we determined earlier to automatically tune ARIMA models. When working with forecasting models, our task is to choose the model that best explains and predicts the data. However, selecting the best model is not always straightforward. The Akaike Information Criterion (AIC) helps us compare different models and determine which one is better. AIC helps evaluate how well the model fits the data, considering its complexity. So, the goal is to find the model with the lowest AIC value.

The code above iterates through all possible parameter combinations and uses the SARIMAX function to build the seasonal ARIMA model. The order parameter sets the main parameters (p, d, q), and the seasonal_order sets the seasonal parameters of the model (P, D, Q, S).

For our data, we get the following result:

ARIMA(0, 1, 1)x(1, 1, 1, 12) - AIC:920.3192974989254

Building and Evaluating the SARIMAX Model

Once we have found the optimal parameters using grid search, we can use these parameters to train the SARIMAX model on our time series data. This helps us understand how well the model fits our data and provides an opportunity to adjust the model’s parameters if necessary.

First, we define the SARIMAX model with the previously found parameters:

from statsmodels.tsa.statespace.sarimax import SARIMAX
model = SARIMAX(time_series, order=(0, 1, 1), seasonal_order=(1, 1, 1, 12))

Next, fit the model:

results = model.fit()

Print the model summary:

print(results.summary())

The model summary is widely used to assess the quality of the parameter fit. Key aspects to pay attention to include:

  • Coefficients: They should be statistically significant. Check the p-values of the coefficients (P>|z|); they should be less than 0.05.

  • AIC (Akaike Information Criterion): A lower AIC value indicates a better model fit.

  • Ljung-Box (L1) (Q): This is the p-value for the Ljung-Box Q-statistic. If the p-value is greater than 0.05, the residuals are random, which is good.

  • Jarque-Bera (JB): This is a test for the normality of residuals. If Prob(JB) is greater than 0.05, the residuals are normally distributed, which is good.

  • Heteroskedasticity (H): This is a test for heteroskedasticity in the residuals. If Prob(H) (two-sided) is greater than 0.05, the residuals are homoscedastic, which is good. Heteroskedasticity occurs when the variance of your forecast errors changes depending on the time period, which means there is a non-uniformity in your data.

Ideally, your model should have statistically significant coefficients, a low AIC value, and residuals that are normally distributed and homoscedastic. Meeting these criteria indicates a good model.

For our model, we obtained the following output:

Image3

Plot the model diagnostics:

results.plot_diagnostics(figsize=(12, 8))
plt.show()

This command generates four diagnostic plots:

  • Residuals Plot: A plot of model residuals over time. If the model is good, the residuals should be random, and the plot should look like white noise.

  • Q-Q Plot: A plot comparing the distribution of residuals to a standard normal distribution. If the points follow the diagonal line, it indicates that the residuals are normally distributed.

  • ACF Plot: A plot of autocorrelation of residuals. If the model is good, the residuals should not be correlated with each other. The absence of blue bars outside the blue noise range indicates this.

  • Histogram of Residuals: A histogram of the distribution of residuals. If the model is good, the residuals should be normally distributed, and the histogram should resemble a bell curve.

These plots, along with the model summary, help us check how well the model fits our data and whether it was correctly specified. If the model is incorrect or unsuitable for the data, it may provide inaccurate forecasts, which could negatively impact decisions made based on these forecasts.

Our diagnostic plots look as follows:

Image12

The model we selected generally meets the requirements, but there is still potential for improving the parameters of the seasonal ARIMA model. Applying SARIMA to time series data often requires a careful approach, and it is always beneficial to conduct a thorough data analysis and spend more time on data preprocessing and exploratory analysis before applying time series models.

Static and Dynamic Forecasting

After successfully training the model, the next step is to generate forecasts and compare the predicted values with the actual data.

Static Forecasting

First, we generate forecasted values using the model, starting from a specific date and extending to the end of the dataset. The get_prediction method returns a prediction object from which we can extract forecasted values using predicted_mean:

st_pred = results.get_prediction(start=pd.to_datetime('1955-12-01'), dynamic=False)
forecast_values = st_pred.predicted_mean

Here, December 1955 is used as an example starting date, but you can adjust this date according to your needs.

Now we have the forecasted values that we can compare with the actual time series data. We will use the Mean Squared Error (MSE) as our metric for evaluating the accuracy of the forecast:

actual_values = time_series['1955-12-01':]['Passengers']
forecast_mse = ((forecast_values - actual_values) ** 2).mean()
print('Mean Squared Error of the forecast is {}'.format(round(forecast_mse, 2)))

MSE is a widely accepted metric for evaluating the performance of forecasting models. A lower MSE indicates a more accurate model. Of course, there is no perfect model, and there will always be some deviation between forecasts and actual data. In our case, the Mean Squared Error of the forecast is 170.37.

Finally, we visualize the results to visually assess the accuracy of our forecasts compared to the actual data:

plt.figure(figsize=(15,8))

plt.plot(actual_values.index, actual_values, label='Actual Values', color='blue')
plt.plot(forecast_values.index, forecast_values, label='Forecasted Values', color='red')

plt.title('Actual and Forecasted Values')
plt.xlabel('Date')
plt.ylabel('Passengers')
plt.legend()

plt.show()

This code generates a plot showing the actual and forecasted passenger numbers over time. The red line represents the forecasted values, while the blue line shows the actual data.

Group 1321314184

This visualization helps you understand how well the model predicts the data.

Dynamic Forecasting

Dynamic forecasting generally provides a more realistic view of future time series behavior because it incorporates forecasts into future predictions.

In static forecasting, the model uses the entire known dataset to forecast each subsequent value. Dynamic forecasting, however, uses the most recent forecasted values for future predictions, starting from a user-defined start date.

To perform dynamic forecasting, set the dynamic parameter to True:

dyn_pred = results.get_prediction(start=pd.to_datetime('1955-12-01'), dynamic=True)
dynamic_forecast_values = dyn_pred.predicted_mean

You can also calculate the Mean Squared Error for the dynamic forecast:

mse_dynamic_forecast = ((dynamic_forecast_values - actual_values) ** 2).mean()
print('Mean Squared Error of the dynamic forecast is {}'.format(round(mse_dynamic_forecast, 2)))

And plot the actual and dynamically forecasted values:

plt.figure(figsize=(15,8))

plt.plot(actual_values.index, actual_values, label='Actual Values', color='blue')
plt.plot(dynamic_forecast_values.index, dynamic_forecast_values, label='Dynamic Forecast', color='green')

plt.title('Actual and Dynamically Forecasted Values')
plt.xlabel('Date')
plt.ylabel('Passengers')
plt.legend()

plt.show()

Group 1321314186

After performing static and dynamic forecasts, we can evaluate whether our time series model is successful. The next step is to attempt to predict future data in this time series.

Creating and Visualizing Forecasts

Now we can finally use the ARIMA model in Python to forecast future values.

To perform forecasting for a certain number of steps ahead, you can use the get_forecast method from the results model:

pred_future = results.get_forecast(steps=12)

We use the trained model (results) to get forecasts for the next 12 periods. Since our data includes information up to December 1960, we will generate predictions for the number of passengers each month for the year 1961.

We will print the forecasted mean values and confidence intervals:

print(f'Forecasted mean values:\n\n{pred_future.predicted_mean}')
print(f'\nConfidence intervals:\n\n{pred_future.conf_int()}')

We can also visualize our forecast:

fig = plt.figure()
plt.plot(pred_future.predicted_mean, label='Forecasted Mean Values')
plt.fill_between(pred_future.conf_int().index,
                 pred_future.conf_int().iloc[:, 0],
                 pred_future.conf_int().iloc[:, 1], color='k', alpha=.2)
plt.legend()  
plt.show()

Group 1321314187

This visualization is very useful for understanding what the model predicts. The forecasted mean values show the expected number of passengers each month in 1961, and the shaded area around the forecast represents the confidence interval.

Conclusion

In this tutorial, we discussed how to apply the ARIMA model for time series forecasting using Python. We covered the entire process from data loading and preprocessing to finding optimal parameters for the model, evaluating it, and ultimately forecasting future values.

Using ARIMA helps us understand the application of more advanced forecasting techniques. It is important to remember that the ARIMA model might not work for all time series, and the results will depend on the quality of your initial data and the preprocessing performed.

Now you can automate the forecasting of time series data using the ARIMA model and the Python programming language. We encourage you to practice and revisit this tutorial with different datasets to enhance your skills.

On our app platform you can find Python applications, such as Celery, Django, FastAPI and Flask. 

Python
15.07.2024
Reading time: 23 min

Similar

Python

The Walrus Operator in Python

The first question newcomers often ask about the walrus operator in Python is: why such a strange name? The answer lies in its appearance. Look at the Python walrus operator: :=. Doesn't it resemble a walrus lounging on a beach, with the symbols representing its "eyes" and "tusks"? That's how it earned the name. How the Walrus Operator Works Introduced in Python 3.8, the walrus operator allows you to assign a value to a variable while returning that value in a single expression. Here's a simple example: print(apples = 7) This would result in an error because print expects an expression, not an assignment. But with the walrus operator: print(apples := 7) The output will be 7. This one-liner assigns the value 7 to apples and returns it simultaneously, making the code compact and clear. Practical Examples Let’s look at a few examples of how to use the walrus operator in Python. Consider a program where users input phrases. The program stops if the user presses Enter. In earlier versions of Python, you'd write it like this: expression = input('Enter something or just press Enter: ') while expression != '': print('Great!') expression = input('Enter something or just press Enter: ') print('Bored? Okay, goodbye.') This works, but we can simplify it using the walrus operator, reducing the code from five lines to three: while (expression := input('Enter something or just press Enter: ')) != '': print('Great!') print('Bored? Okay, goodbye.') Here, the walrus operator allows us to assign the user input to expression directly inside the while loop, eliminating redundancy. Key Features of the Walrus Operator: The walrus operator only assigns values within other expressions, such as loops or conditions. It helps reduce code length while maintaining clarity, making your scripts more efficient and easier to read. Now let's look at another example of the walrus operator within a conditional expression, demonstrating its versatility in Python's modern syntax. Using the Walrus Operator with Conditional Constructs Let’s write a phrase, assign it to a variable, and then find a word in this phrase using a condition: phrase = 'But all sorts of things and weather must be taken in together to make up a year and a sphere...' word = phrase.find('things') if word != -1: print(phrase[word:]) The expression [word:] allows us to get the following output: things and weather must be taken in together to make up a year and a sphere... Now let's shorten the code using the walrus operator. Instead of: word = phrase.find('things') if word != -1: print(phrase[word:]) we can write: if (word := phrase.find('things')) != -1: print(phrase[word:]) In this case, we saved a little in volume but also reduced the number of lines. Note that, despite the reduced time for writing the code, the walrus operator doesn’t always simplify reading it. However, in many cases, it’s just a matter of habit, so with practice, you'll learn to read code with "walruses" easily. Using the Walrus Operator with Numeric Expressions Lastly, let’s look at an example from another area where using the walrus operator helps optimize program performance: numerical operations. We will write a simple program to perform exponentiation: def pow(number, power): print('Calling pow') result = 1 while power: result *= number power -= 1 return result Now, let’s enter the following in the interpreter: >>> [pow(number, 2) for number in range(3) if pow(number, 2) % 2 == 0] We get the following output: Calling pow Calling pow Calling pow Calling pow Calling pow [0, 4, 16] Now, let's rewrite the input in the interpreter using the walrus operator: >>> [p for number in range(3) if (p := pow(number, 2)) % 2 == 0] Output: Calling pow Calling pow Calling pow [0, 4, 16] As we can see, the code hasn’t shrunk significantly, but the number of function calls has nearly been halved, meaning the program will run faster! Conclusion In conclusion, the walrus operator (:=) introduced in Python 3.8 streamlines code by allowing assignment and value retrieval in a single expression. This operator enhances readability and efficiency, particularly in loops and conditional statements. Through practical examples, we’ve seen how it reduces line counts and minimizes redundant function calls, leading to faster execution. With practice, developers can master the walrus operator, making their code cleaner and more concise. On our app platform you can deploy Python applications, such as Celery, Django, FastAPI and Flask. 
23 October 2024 · 4 min to read
Python

Python String Functions

As the name suggests, Python 3 string functions are designed to perform various operations on strings. There are several dozen string functions in the Python programming language. In this article, we will cover the most commonly used ones and several special functions that may be less popular but are still useful. They can be helpful not only for formatting but also for data validation. List of Basic String Functions for Text Formatting First, let’s discuss string formatting functions, and to make the learning process more enjoyable, we will use texts generated by a neural network in our examples. capitalize() — Converts the first character of the string to uppercase, while all other characters will be in lowercase: >>> phrase = 'the shortage of programmers increases the significance of DevOps. After the presentation, developers start offering their services one after another, competing with each other for DevOps.' >>> phrase.capitalize() 'The shortage of programmers increases the significance of devops. after the presentation, developers start offering their services one after another, competing with each other for devops.' casefold() — Returns all elements of the string in lowercase: >>> phrase = 'Cloud providers offer scalable computing resources and services over the internet, enabling businesses to innovate quickly. They support various applications, from storage to machine learning, while ensuring reliability and security.' >>> phrase.casefold() 'cloud providers offer scalable computing resources and services over the internet, enabling businesses to innovate quickly. they support various applications, from storage to machine learning, while ensuring reliability and security.' center() — This method allows you to center-align strings: >>> text = 'Python is great for writing AI' >>> newtext = text.center(40, '*') >>> print(newtext) *****Python is great for writing AI***** A small explanation: The center() function has two arguments: the first (length of the string for centering) is mandatory, while the second (filler) is optional. In the operation above, we used both. Our string consists of 30 characters, so the remaining 10 were filled with asterisks. If the second attribute were omitted, spaces would fill the gaps instead. upper() and lower() — convert all characters to uppercase and lowercase, respectively: >>> text = 'Projects using Internet of Things technology are becoming increasingly popular in Europe.' >>> text.lower() 'projects using internet of things technology are becoming increasingly popular in europe.' >>> text.upper() 'PROJECTS USING INTERNET OF THINGS TECHNOLOGY ARE BECOMING INCREASINGLY POPULAR IN EUROPE.' replace() — is used to replace a part of the string with another element: >>> text.replace('Europe', 'USA') 'Projects using Internet of Things technology are becoming increasingly popular in the USA.' The replace() function also has an optional count attribute that specifies the maximum number of replacements if the element to be replaced occurs multiple times in the text. It is specified in the third position: >>> text = 'hooray hooray hooray' >>> text.replace('hooray', 'hip', 2) 'hip hip hooray' strip() — removes identical characters from the edges of a string: >>> text = 'ole ole ole' >>> text.strip('ole') 'ole' If there are no symmetrical values, it will remove what is found on the left or right. If the specified characters are absent, the output will remain unchanged: >>> text.strip('ol') 'e ole ole' >>> text.strip('le') 'ole ole o' >>> text.strip('ura') 'ole ole ole' title() — creates titles, capitalizing each word: >>> texttitle = 'The 5G revolution: transforming connectivity. How next-gen networks are shaping our digital future' >>> texttitle.title() 'The 5G Revolution: Transforming Connectivity. How Next-Gen Networks Are Shaping Our Digital Future' expandtabs() — changes tabs in the text, which helps with formatting: >>> clublist = 'Milan\tReal\tBayern\tArsenal' >>> print(clublist) Milan Real Bayern Arsenal >>> clublist.expandtabs(1) 'Milan Real Bayern Arsenal' >>> clublist.expandtabs(5) 'Milan Real Bayern Arsenal' String Functions for Value Checking Sometimes, it is necessary to count a certain number of elements in a sequence or check if a specific value appears in the text. The following string functions solve these and other tasks. count() — counts substrings (individual elements) that occur in a string. Let's refer again to our neural network example: >>> text = "Cloud technologies significantly accelerate work with neural networks and AI. These technologies are especially important for employees of large corporations operating in any field — from piloting spacecraft to training programmers." >>> element = "o" >>> number = text.count(element) >>> print("The letter 'o' appears in the text", number, "time(s).") The letter 'o' appears in the text 19 time(s). As a substring, you can specify a sequence of characters (we'll use text from the example above): >>> element = "ob" >>> number = text.count(element) >>> print("The combination 'ob' appears in the text", number, "time(s).") The combination 'in' appears in the text 5 time(s). Additionally, the count() function has two optional numerical attributes that specify the search boundaries for the specified element: >>> element = "o" >>> number = text.count(element, 20, 80) >>> print("The letter 'o' appears in the specified text fragment", number, "time(s).") The letter 'o' appears in the specified text fragment 6 time(s). The letter 'o' appears in the specified text fragment 6 time(s). find() — searches for the specified value in the string and returns the smallest index. Again, we will use the example above: >>> print(text.find(element)) 7 This output means that the first found letter o is located at position 7 in the string (actually at position 8, because counting in Python starts from zero). Note that the interpreter ignored the capital letter O, which is located at position zero. Now let's combine the two functions we've learned in one code: >>> text = "Cloud technologies significantly accelerate work with neural networks and AI. These technologies are especially important for employees of large corporations operating in any field — from piloting spacecraft to training programmers." >>> element = "o" >>> number = text.count(element, 20, 80) >>> print("The letter 'o' appears in the specified text fragment", number, "time(s), and the first time in the whole text at", (text.find(element)), "position.") The letter 'o' appears in the specified text fragment 3 time(s), and the first time in the whole text at 7 position. index() — works similarly to find(), but will raise an error if the specified value is absent: Traceback (most recent call last): File "C:\Python\text.py", line 4, in <module> print(text.index(element)) ValueError: substring not found Here's what the interpreter would return when using the find() function in this case: -1 This negative position indicates that the value was not found. enumerate() — a very useful function that not only iterates through the elements of a list or tuple, returning their values, but also returns the ordinal number of each element: team_scores = [78, 74, 56, 53, 49, 47, 44] for number, score in enumerate(team_scores, 1): print(str(number) + '-th team scored ' + str(score) + ' points.') To output the values with their ordinal numbers, we introduced a few variables: number for ordinal numbers, score for the values of the list, and str indicates a string. And here’s the output: 1-th team scored 78 points. 2-th team scored 74 points. 3-th team scored 56 points. 4-th team scored 53 points. 5-th team scored 49 points. 6-th team scored 47 points. 7-th team scored 44 points. Note that the second attribute of the enumerate() function is the number 1, otherwise Python would start counting from zero. len() — counts the length of an object, i.e., the number of elements that make up a particular sequence: >>> len(team_scores) 7 This way, we counted the number of elements in the list from the example above. Now let's ask the neural network to write a string again and count the number of characters in it: >>> network = 'It is said that artificial intelligence excludes the human factor. But do not forget that the human factor is still present in the media and government structures.' >>> len(network) 162 Special String Functions in Python join() — allows you to convert lists into strings: >>> cities = ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'Philadelphia', 'San Antonio'] >>> cities_str = ', '.join(cities) >>> print('Cities in one line:', cities_str) Cities in one line: New York, Los Angeles, Chicago, Houston, Phoenix, Philadelphia, San Antonio print() — provides a printed representation of any object in Python: >>> cities = ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'Philadelphia', 'San Antonio'] >>> print(cities) ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'Philadelphia', 'San Antonio'] type() — returns the type of the object: >>> type(cities) <class 'list'> We found out that the object from the previous example is a list. This is useful for beginners, as they may initially confuse lists with tuples, which have different functionalities and are handled differently by the interpreter. map() — is a fairly efficient replacement for a for loop, allowing you to iterate over the elements of an iterable object, applying a built-in function to each of them. For example, let's convert a list of string values into integers using the int function: >>> numbers_list = ['4', '7', '11', '12', '17'] >>> list(map(int, numbers_list)) [4, 7, 11, 12, 17] As we can see, we used the list() function, "wrapping" the map() function in it—this was necessary to avoid the following output: >>> numbers_list = ['4', '7', '11', '12', '17'] >>> map(int, numbers_list) <map object at 0x0000000002E272B0> This is not an error; it simply produces the ID of the object, and the program will continue to run. However, the list() method is useful in such cases to get the desired list output. Of course, we haven't covered all string functions in Python. Still, this set will already help you perform a large number of operations with strings and carry out various transformations (programmatic and mathematical). On our app platform you can deploy Python applications, such as Celery, Django, FastAPI and Flask. 
23 October 2024 · 9 min to read
Python

Deploying Python Applications with Gunicorn

In this article, we’ll show how to set up an Ubuntu 20.04 server and install and configure the components required for deploying Python applications. We’ll configure the WSGI server Gunicorn to interact with our application. Gunicorn will serve as an interface that converts client requests via the HTTP protocol into Python function calls executed by the application. Then, we will configure Nginx as a reverse proxy server for Gunicorn, which will forward requests to the Gunicorn server. Additionally, we will cover securing HTTP connections with an SSL certificate or using other features like load balancing, caching, etc. These details can be helpful when working with cloud services like those provided by Hostman. Creating a Python Virtual Environment To begin, we need to update all packages: sudo apt update Ubuntu provides the latest version of the Python interpreter by default. Let’s check the installed version using the following command: python3 --version Example output: Python 3.10.12 We’ll set up a virtual environment to ensure that our project has its own dependencies, separate from other projects. First, install the virtualenv package, which allows you to create virtual environments: sudo apt-get install python3-venv python3-dev Next, create a folder for your project and navigate into it: mkdir myappcd myapp Now, create a virtual environment: python3 -m venv venv And create a folder for your project: mkdir app Your project directory should now contain two items: app and venv. You can verify this using the standard Linux command to list directory contents: ls Expected output: myapp venv You need to activate the virtual environment so that all subsequent components are installed locally for the project: source venv/bin/activate Installing and Configuring Gunicorn Gunicorn (Green Unicorn) is a Python WSGI HTTP server for UNIX. It is compatible with various web frameworks, fast, easy to implement, and uses minimal server resources. To install Gunicorn, run the following command: pip install gunicorn WSGI and Python WSGI (Web Server Gateway Interface) is the standard interface between a Python application running on the server side and the web server itself, such as Nginx. A WSGI server interacts with the application, allowing you to run code when handling requests. Typically, the application is provided as an object named application in a Python module, which is made available to the server. In the standard wsgi.py file, there is usually a callable application. For example, let’s create such a file using the nano text editor: nano wsgi.py Add the following simple code to the file: from aiohttp import web async def index(request): return web.Response(text="Welcome home!") app = web.Application() app.router.add_get('/', index) In the code above, we import aiohttp, a library that provides an asynchronous HTTP client built on top of asyncio. HTTP requests are a classic example of where asynchronous handling is ideal, as they involve waiting for server responses, during which other code can execute efficiently. This library allows sequential requests to be made without waiting for the first response before sending a new one. It’s common to run aiohttp servers behind Nginx. Running the Gunicorn Server You can launch the server using the following command template: gunicorn [OPTIONS] [WSGI_APP] Here, [WSGI_APP] consists of $(MODULE_NAME):$(VARIABLE_NAME) and [OPTIONS] is a set of parameters for configuring Gunicorn. A simple command would look like this: gunicorn wsgi:app To restart Gunicorn, you can use: sudo systemctl restart gunicorn Systemd Integration systemd is a system and service manager that allows for strict control over processes, resources, and permissions. We’ll create a socket that systemd will listen to, automatically starting Gunicorn in response to traffic. Configuring the Gunicorn Service and Socket First, create the service configuration file: sudo nano /etc/systemd/system/gunicorn.service Add the following content to the file: [Unit] Description=gunicorn daemon Requires=gunicorn.socket After=network.target [Service] Type=notify User=someuser Group=someuser RuntimeDirectory=gunicorn WorkingDirectory=/home/someuser/myapp ExecStart=/path/to/venv/bin/gunicorn wsgi:app ExecReload=/bin/kill -s HUP $MAINPID KillMode=mixed TimeoutStopSec=5 PrivateTmp=true [Install] WantedBy=multi-user.target Make sure to replace /path/to/venv/bin/gunicorn with the actual path to the Gunicorn executable within your virtual environment. It will likely look something like this: /home/someuser/myapp/venv/bin/gunicorn. Next, create the socket configuration file: sudo nano /etc/systemd/system/gunicorn.socket Add the following content: [Unit] Description=gunicorn socket [Socket] ListenStream=/run/gunicorn.sock SocketUser=www-data [Install] WantedBy=sockets.target Enable and start the socket with: systemctl enable --now gunicorn.socket Configuring Gunicorn Let's review some useful parameters for Gunicorn in Python 3. You can find all possible parameters in the official documentation. Sockets -b BIND, --bind=BIND — Specifies the server socket. You can use formats like: $(HOST), $(HOST):$(PORT). Example: gunicorn --bind=127.0.0.1:8080 wsgi:app This command will run your application locally on port 8080. Worker Processes -w WORKERS, --workers=WORKERS — Sets the number of worker processes. Typically, this number should be between 2 to 4 per server core. Example: gunicorn --workers=2 wsgi:app Process Type -k WORKERCLASS, --worker-class=WORKERCLASS — Specifies the type of worker process to run. By default, Gunicorn uses the sync worker type, which is a simple synchronous worker that handles one request at a time. Other worker types may require additional dependencies. Asynchronous worker processes are available using Greenlets (via Eventlet or Gevent). Greenlets are a cooperative multitasking implementation for Python. The corresponding parameters are eventlet and gevent. We will use an asynchronous worker type compatible with aiohttp: gunicorn wsgi:app --bind localhost:8080 --worker-class aiohttp.GunicornWebWorker Access Logging You can enable access logging using the --access-logfile flag. Example: gunicorn wsgi:app --access-logfile access.log Error Logging To specify an error log file, use the following command: gunicorn wsgi:app --error-logfile error.log You can also set the verbosity level of the error log output using the --log-level flag. Available log levels in Gunicorn are: debug info warning error critical By default, the info level is set, which omits debug-level information. Installing and Configuring Nginx First, install Nginx with the command: sudo apt install nginx Let’s check if the Nginx service can connect to the socket created earlier: sudo -u www-data curl --unix-socket /run/gunicorn.sock http If successful, Gunicorn will automatically start, and you'll see the HTML code from the server in the terminal. Nginx configuration involves adding config files for virtual hosts. Each proxy configuration should be stored in the /etc/nginx/sites-available directory. To enable each proxy server, create a symbolic link to it in /etc/nginx/sites-enabled. When Nginx starts, it automatically loads all proxy servers in this directory. Create a new configuration file: sudo nano /etc/nginx/sites-available/myconfig.conf Then create a symbolic link with the command: sudo ln -s /etc/nginx/sites-available/myconfig.conf /etc/nginx/sites-enabled Nginx must be restarted after any changes to the configuration file to apply the new settings. First, check the syntax of the configuration file: nginx -t Then reload Nginx: nginx -s reload Conclusion Gunicorn is a robust and versatile WSGI server for deploying Python applications, offering flexibility with various worker types and integration options like Nginx for load balancing and reverse proxying. Its ease of installation and configuration, combined with detailed logging and scaling options, make it an excellent choice for production environments. By utilizing Gunicorn with frameworks like aiohttp and integrating it with Nginx, you can efficiently serve Python applications with improved performance and resource management.
23 October 2024 · 7 min to read

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support