Sign In
Sign In

Merge Sort Algorithm - Java, C, and Python Implementation

Merge Sort Algorithm - Java, C, and Python Implementation
Shahid Ali
Technical writer
Python Java
14.06.2024
Reading time: 7 min

Merge Sort is a classic sorting algorithm known for its efficiency and stability. It belongs to the category of divide and conquer algorithms, making it suitable for sorting large datasets. This tutorial will guide you through implementing Merge Sort in Java, C, and Python, providing insights into its time complexity, space complexity, and best practices.

Merge Sort Algorithm Overview

Merge Sort operates by recursively dividing the input array into smaller subarrays until each subarray contains only one element. It then merges these subarrays in a sorted manner until the entire array is sorted. This approach ensures a time complexity of O(n log n) in the average and worst cases, making Merge Sort a preferred choice for various applications.

Merge Sort Implementation in Java

Merge Sort is a classic divide-and-conquer algorithm used for sorting an array or a list. It works by dividing the array into two halves, recursively sorting each half, and then merging the sorted halves to produce the sorted output.

Below is a simple Java implementation of Merge Sort:

Step 1: Define the Merge Sort Function

public class MergeSort {
    // This method sorts the array from index 'left' to 'right' using merge sort algorithm
    public void mergeSort(int[] array, int left, int right) {
        // Base condition to end recursion
        if (left < right) {
            // Calculate the midpoint of the array
            int mid = left + (right - left) / 2;

            // Recursively sort the first half
            mergeSort(array, left, mid);

            // Recursively sort the second half
            mergeSort(array, mid + 1, right);

            // Merge the two sorted halves
            merge(array, left, mid, right);
        }
    }
    
    // This method merges two subarrays of 'array'.
    // First subarray is array[left..mid]
    // Second subarray is array[mid+1..right]
    private void merge(int[] array, int left, int mid, int right) {
        // Calculate the sizes of the two subarrays to be merged
        int n1 = mid - left + 1;
        int n2 = right - mid;

        // Create temporary arrays for left and right subarrays
        int[] leftArray = new int[n1];
        int[] rightArray = new int[n2];

        // Copy data to temporary arrays
        for (int i = 0; i < n1; ++i)
            leftArray[i] = array[left + i];
        for (int j = 0; j < n2; ++j)
            rightArray[j] = array[mid + 1 + j];
        
        // Initial indexes of the first and second subarrays
        int i = 0, j = 0;

        // Initial index of the merged subarray
        int k = left;

        // Merge the temporary arrays back into the original array
        while (i < n1 && j < n2) {
            // Compare elements from leftArray and rightArray and copy the smaller element to the original array
            if (leftArray[i] <= rightArray[j]) {
                array[k] = leftArray[i];
                i++;
            } else {
                array[k] = rightArray[j];
                j++;
            }
            k++;
        }
        
        // Copy any remaining elements of leftArray, if any
        while (i < n1) {
            array[k] = leftArray[i];
            i++;
            k++;
        }
        
        // Copy any remaining elements of rightArray, if any
        while (j < n2) {
            array[k] = rightArray[j];
            j++;
            k++;
        }
    }
}

Step 2: Implement Merge Sort in Main Class

import java.util.Arrays;

public class Main {
    public static void main(String[] args) {
        int[] array = { 12, 11, 13, 5, 6, 7 };  // Initialize the array to be sorted
        MergeSort sorter = new MergeSort();    // Create an instance of the MergeSort class
        sorter.mergeSort(array, 0, array.length - 1);  // Call the mergeSort method on the array
        System.out.println("Sorted array: " + Arrays.toString(array));  // Print the sorted array
    }
}

Output:

1

Merge Sort Implementation in C

Merge Sort is an efficient sorting algorithm suitable for large datasets due to its O(n log n) time complexity. The C implementation provided demonstrates the core principles of the algorithm and can be adapted for more complex applications.

The C implementation follows a similar logic to Java but with syntax differences.

Define the Merge Sort Function

  #include 

// Function to merge two halves of the array
void merge(int arr[], int l, int m, int r) {
    int n1 = m - l + 1; // Size of the left subarray
    int n2 = r - m;     // Size of the right subarray

    int L[n1], R[n2];   // Temporary arrays to hold the two halves

    // Copy data to temporary arrays L[] and R[]
    for (int i = 0; i < n1; i++)
        L[i] = arr[l + i];
    for (int j = 0; j < n2; j++)
        R[j] = arr[m + 1 + j];

    int i = 0, j = 0, k = l; // Initial indices of the subarrays and merged array

    // Merge the temporary arrays back into arr[l..r]
    while (i < n1 && j < n2) {
        if (L[i] <= R[j]) {
            arr[k] = L[i];
            i++;
        } else {
            arr[k] = R[j];
            j++;
        }
        k++;
    }

    // Copy the remaining elements of L[], if any
    while (i < n1) {
        arr[k] = L[i];
        i++;
        k++;
    }

    // Copy the remaining elements of R[], if any
    while (j < n2) {
        arr[k] = R[j];
        j++;
        k++;
    }
}

// Function to implement MergeSort
void mergeSort(int arr[], int l, int r) {
    if (l < r) {
        int m = l + (r - l) / 2; // Find the middle point

        // Sort first and second halves
        mergeSort(arr, l, m);
        mergeSort(arr, m + 1, r);

        // Merge the sorted halves
        merge(arr, l, m, r);
    }
}

int main() {
    int arr[] = { 12, 11, 13, 5, 6, 7 }; // Input array
    int n = sizeof(arr) / sizeof(arr[0]); // Calculate the number of elements in the array

    // Call mergeSort on the array
    mergeSort(arr, 0, n - 1);

    // Print the sorted array
    printf("Sorted array: ");
    for (int i = 0; i < n; i++)
        printf("%d ", arr[i]);
    printf("\n");

    return 0; // End of the program
}

Output:

2

Merge Sort Implementation in Python

Python offers a concise syntax for implementing Merge Sort.

Define the Merge Sort Function

def merge_sort(arr):
    # If the array has more than one element, it can be split and sorted
    if len(arr) > 1:
        mid = len(arr) // 2  # Find the middle point to divide the array
        left_half = arr[:mid]  # Left half from start to middle
        right_half = arr[mid:]  # Right half from middle to end
        
        # Recursively sort the two halves
        merge_sort(left_half)
        merge_sort(right_half)
        
        i = j = k = 0  # Initialize pointers for left_half (i), right_half (j), and merged array (k)
        
        # Merge the left and right halves back into the original array
        while i < len(left_half) and j < len(right_half):
            if left_half[i] < right_half[j]:
                arr[k] = left_half[i]
                i += 1
            else:
                arr[k] = right_half[j]
                j += 1
            k += 1
        
        # Copy any remaining elements of left_half, if any
        while i < len(left_half):
            arr[k] = left_half[i]
            i += 1
            k += 1
        
        # Copy any remaining elements of right_half, if any
        while j < len(right_half):
            arr[k] = right_half[j]
            j += 1
            k += 1

# Sample array to be sorted
array = [12, 11, 13, 5, 6, 7]
merge_sort(array)  # Call merge_sort on the array
print("Sorted array:", array)  # Print the sorted array

Output:

3

Performance Analysis and Complexity

Merge Sort exhibits a time complexity of O(n log n) in all cases. Its space complexity is O(n) due to auxiliary space requirements for merging. These characteristics make Merge Sort efficient for large datasets but may consume more memory compared to in-place sorting algorithms.

Conclusion

In conclusion, Merge Sort is a versatile sorting algorithm suitable for various applications. Its stable performance and predictable time complexity make it a valuable tool in software development.

Python Java
14.06.2024
Reading time: 7 min

Similar

Python

Functions in Python

Functions in Python are blocks of reusable code that you can access by calling the function name and passing arguments. Using functions in Python significantly simplifies a programmer's work because, instead of writing code repeatedly, one can simply call a function. How to Create a Function in Pyhton Let's start with an example and then move on to the explanation: def multiply(first, second):    return first * second We have just written a function that performs a simple task: it multiplies the values (arguments) passed to it. These values can then be entered after the function name in the program to get the product of the factors. Now, enter the following in IDLE: >>> multiply(7, 8) Arguments can include not only whole numbers but also decimal numbers, for example: >>> multiply(7.4, 8.2)60.68 Now, let's break down the code. Here, we define a Python function using the def keyword and the function name. In parentheses, we specify parameters that will accept various arguments from user input. A colon must follow the closing parenthesis, after which a new line with indentation starts the function body, describing what the function does. If you're writing code in an editor, the indentation will be added automatically. We used the return operator, which explicitly returns arguments. Note that after return, there is an instruction on what the program should do with the arguments. In this case, it multiplies them. Practical Example of Using Python Functions Here, we will demonstrate how Python functions help optimize routine tasks. The following example is simplified but illustrative. By understanding how functions work, you can learn to solve your own tasks, which will become more complex and interesting as you progress in the language. Let's say we opened a bookstore and purchased a cash register, and the cashier had already issued receipts for the first customers. Initially, a receipt might look like this: print("Learn Now, LLC") print("Programming Book", end=" ") print(1, end=" pcs. ") print(50, end=" euro") print("\nAdvanced Programming Book", end=" ") print(1, end=" pcs. ") print(100, end=" euro") print("\nTotal:", 150, end=" euro") print("\nThank you for your purchase!") Output: Learn Now, LLC Programming Book 1 pcs. 50 euro Advanced Programming Book 1 pcs. 100 euro Total: 150 euro Thank you for your purchase! Now, imagine that a whole stack of books has been purchased, and the number of customers is increasing daily. While you manually calculate the total for one customer, others start getting impatient. This is where automation comes in.  Let's say someone buys seven different books, with some books purchased in multiple copies: def check(book_attr): total = 0 print("Learn Now, LLC") for book in book_attr: a = book[0] b = book[1] c = book[2] print(f"{a} ({b} pcs.) - {c} euro") total += b * c print(f"\nTotal: {total} euro") print("Thank you for your purchase!") book_attr = [ ("Programming Book", 2, 50), ("Advanced Programming Book", 2, 100), ("Programming Book 80 lvl", 2, 195), ("Beginner's Guide to Python", 1, 120), ("You Can Become a Programmer", 1, 98), ("Functional Programming in Python", 1, 95), ("Secrets of Clean Code", 1, 80), ] As we can see, new variables appeared, and the purchase list was placed in a separate block. Now, when generating a new receipt, all we need to do for automatic total calculation is enter the book names, quantities, and prices per unit. Once all items are entered, we call our function with the parameter formatted as a tuple above: check(book_attr) This produces the following output: Learn Now, LLC Programming Book (2 pcs.) - 50 euro Advanced Programming Book (2 pcs.) - 100 euro Programming Book 80 lvl (2 pcs.) - 195 euro Beginner's Guide to Python (1 pcs.) - 120 euro You Can Become a Programmer (1 pcs.) - 98 euro Functional Programming in Python (1 pcs.) - 95 euro Secrets of Clean Code (1 pcs.) - 80 euro Total: 1083 euro Thank you for your purchase! That's it! The total amount was calculated automatically. Let’s break down the code: The variable total stores the purchase total and changes as new values are added to the tuple. A for loop is used to define a set of variables that store the following values: a: product name b: quantity c: price per unit Next, we give the print command. The letter f in print statements (which is itself a built-in function, by the way) means that f-strings are used. For now, it's enough to know that they are a convenient formatting method, and the code is self-explanatory. The next line should not be surprising: it calculates the total by multiplying the quantity of each item by its price and adding the result to the running total. Finally, we use another f-string for text formatting, and we have already discussed the tuple block that stores the necessary data for purchase calculations. Features of Functions in Python Key Advantages: No need to repeat specific blocks of code, which can sometimes be quite large. Functions can be called as many times as needed, even consecutively. When divided into multiple functional blocks, large programs become much easier to track. There are almost no downsides to functions in Python, except that they may not always be convenient. In some cases, it is easier to use generators, as certain functions (e.g., filter) may return iterators, requiring additional code to process them. For example, if we enter the following in IDLE: >>> numbers = [2, 4, 6, 8, 10, 12, 14] >>> filter(lambda num: num >= 10, numbers) We get this result: <filter object at 0x00000000030C3220> To correctly display elements that meet the condition, we need to wrap this expression as follows: >>> list(filter(lambda num: num >= 10, numbers))[10, 12, 14] Built-in Functions in Python You have almost certainly used them in your first Python lesson. Here’s an example: print("Hello, World!") The print function is a built-in function, and "Hello, World!" is its argument. Python has hundreds, even thousands, of built-in functions, especially when additional libraries are included. You don't need to know all of them; you can always check the documentation if you encounter an unfamiliar function. However, you will need to learn some common built-in functions, as these core elements are essential for writing any useful program. Here are some commonly used built-in functions: len returns the length (number of elements) of a sequence such as a string, list, tuple, range, or array: flowers = ["bellflower", "cornflower", "buttercup", "forget-me-not", "daisy"]len(flowers) Output: 5 str converts numbers into strings (since Python does not allow direct concatenation of strings and numbers): year = 2008"Euro " + str(year) Output: 'Euro 2008' int converts strings into integers. It also rounds floating-point numbers to the nearest integer, always towards zero: int(554.995) Output: 554 float converts integer values into floating-point numbers, which can be useful for certain calculations: float(55) Output: 55.0 tuple converts lists into tuples: flowers = ["bellflower", "cornflower", "buttercup", "forget-me-not", "daisy"]tuple(flowers) Output: ('bellflower', 'cornflower', 'buttercup', 'forget-me-not', 'daisy') dict allows you to create dictionaries. Here’s an example of creating a dictionary from a list of tuples using dict: clubs = [('Barcelona', 1), ('Juventus', 3), ('Liverpool', 2), ('Real Madrid', 5), ('Bayern München', 4)]dict(clubs) Output: {'Barcelona': 1, 'Juventus': 3, 'Liverpool': 2, 'Real Madrid': 5, 'Bayern München': 4} range creates number sequences, which can be useful for iterating through numeric values: for number in range(0, 30, 3): print(number) Output: 0 3 6 9 12 15 18 21 24 27 The range function takes three parameters: The first two define the range limits. The third (optional) parameter specifies the step. In this case, numbers from 0 to 30 are printed in steps of 3. The upper bound is not included in the output. To include it, the range should be extended slightly: for number in range(0, 31, 3): print(number) Output: 0 3 … 27 30 Using the Result of One Function in Another Python Function Finally, let’s look at another interesting technique. Since functions in Python are objects, they can be passed as arguments to other functions and referenced. def check(company="Learn Now"): """Allows inserting different company names in the receipt""" print(f"{company}, LLC") Let’s enter the name of another company: check("Enlightenment") Output: Enlightenment, LLC Now, let’s pass the created function to the built-in help function to learn what it does: help(check) Output: Help on function check in module __main__: check(company='Learn Now') Allows inserting different company names in the receipt As we can see, it is quite simple. What We Learned In this tutorial, we explored how functions work in Python 3 and learned how to create and use them. We discussed built-in tools and examined an example of passing functions as objects to other functions. By studying functions more deeply, you will appreciate their usefulness even when writing relatively small applications.
02 April 2025 · 8 min to read
Python

Comments in Python 3

Comments in a program are parts of the code that are ignored by the interpreter or compiler. They are used to: Make the code more readable; Explain what the code does and why; Prevent parts of the code from executing during testing/execution; Leave notes about things that need to be done/modified/removed. Overall, comments are meant to make a programmer's life easier—they play no role for the computer. However, in some programming methodologies, such as extreme programming, it is believed that if code needs comments, then the code is poorly written. In this article, you will learn how to write comments in Python 3 and what Docstrings and PEP are. Comments in Python Different programming languages use different syntax for comments. Often, it's a double slash (//). In Python 3, comments in the code start with the # symbol. For example: # The code prints "Hello, World!" to the consoleprint("Hello, World!") You can also place a comment on the same line as the code: print("Hello, World!")  # The code prints "Hello, World!" to the console Comments should be useful to the reader. For example, this comment is not helpful: # This code clearly does somethingprint("Hello, World!") A good comment should explain or describe the code and its purpose. Some developers believe that comments should describe the programmer’s intent. In general, it is best to think of comments as a form of code documentation. If they are not useful, they should be removed. You can also use comments to disable parts of the code to prevent them from executing. This can be useful for testing and debugging. Suppose we need to comment the following code: db_lp = sqlite3.connect('login_password.db') cursor_db = db_lp.cursor() sql_create = '''CREATE TABLE passwords( login TEXT PRIMARY KEY, password TEXT NOT NULL);''' cursor_db.execute(sql_create) db_lp.commit() cursor_db.close() db_lp.close() The goal of commenting is to make this block of code understandable. For example, we can comment it like this: db_lp = sqlite3.connect('login_password.db') # Creating the login_password database cursor_db = db_lp.cursor() # Cursor object for executing SQL queries # SQL query to create the "passwords" table in the database sql_create = '''CREATE TABLE passwords( login TEXT PRIMARY KEY, password TEXT NOT NULL);''' cursor_db.execute(sql_create) # Executing the sql_create query db_lp.commit() # Committing changes # Closing the Cursor and database cursor_db.close() db_lp.close() Manually commenting Python code can be inconvenient. To format a block of code as single-line comments, you can use keyboard shortcuts: PyCharm: Ctrl + / Visual Studio Code: To comment/uncomment a line Ctrl + /, for a block of code Shift + Alt + A Eclipse: To comment/uncomment a line Ctrl + /, for a block of code Ctrl + Shift + / Visual Studio: Ctrl + K then Ctrl + C to comment a block of code, and Ctrl + K then Ctrl + U to uncomment it Docstring in Python A Docstring is a string literal placed immediately after the declaration of a module, function, class, or other structure. It is a convenient way to document code, making it accessible for reference. Docstrings were introduced in Python in 2001 and are described in PEP 257. What is PEP? Python's development follows a structured process involving creating, discussing, selecting, and implementing PEP (Python Enhancement Proposal) documents. PEPs contain proposals for language development, including new features, modifications to existing ones, etc. One of the most well-known and useful PEP documents is PEP 8, which outlines guidelines and conventions for writing Python code. If you plan to write in Python, familiarize yourself with these conventions. Since there are many rules, special tools exist to help enforce them. Some useful tools are listed below. Now, back to Docstring. A Docstring is the first statement in an object's declaration. Here’s an example: def function(x, y, z): """ Docstring of this function """ def inner_function(): """ Docstring of the nested function """ The syntax for a Docstring is three double quotes at the beginning and end. You can also use single quotes or fewer than three quotes, but PEP 257 recommends using three double quotes. You can access an object’s Docstring using the __doc__ method: def subtraction(a, b): """Function subtracts b from a""" return a - b print(subtraction.__doc__) Output: Function subtracts b from a You can also use the __doc__ property to get information about built-in Python methods, such as print: print(print.__doc__) Output: print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False) Prints the values to a stream, or to sys.stdout by default. Optional keyword arguments: file: a file-like object (stream); defaults to the current sys.stdout. sep: string inserted between values, default a space. end: string appended after the last value, default a newline. flush: whether to forcibly flush the stream. String literals placed anywhere in the Python code can also serve as documentation. The Python bytecode compiler will not recognize them, and they will not be accessible at runtime via __doc__. However, there are two additional types of Docstrings that can be extracted using documentation tools. Additional Docstrings Additional Docstrings are string literals ignored by the Python compiler but recognized by Docutils tools. They are placed immediately after a Docstring. Example: def function(arg): """This is the Docstring of this function. It will be available via __doc__.""" """ This is an additional Docstring. It will be ignored by the compiler but recognized by Docutils. """ pass Attribute Docstrings Attribute Docstrings are string literals placed immediately after a simple assignment at the module, class, or __init__ method level. Example: def f(x): """This is the Docstring of this function. It will be available via __doc__""" return x**2 f.a = 1 """ This is an Attribute Docstring for the attribute f.a, it will be ignored by the compiler but recognized by Docutils. """ Here are the main PEP 257 guidelines for using docstrings: Leave a blank line after all Docstrings. The script's Docstring should serve as a "usage message," potentially displayed to the user if incorrect arguments are provided. It should describe functionality, parameter syntax, environment variables, and files used. The module's Docstring should list important objects with a one-line explanation for each. The function/method Docstring should describe behavior, arguments, return values, possible exceptions, and constraints. The class Docstring should include methods, instance variables, and describe the class behavior. The constructor (__init__) should have its own separate Docstring. If a class is a subclass and mostly inherits behavior from a parent class, its documentation should mention this and describe any differences. Useful Tools Here are some tools to help with PEP 8 and comments in Python 3: pycodestyle — Checks if your code follows PEP 8. Black — Formats code according to PEP 8 (mostly). Doxygen, PyDoc, pdoc — Automatically generate documentation from Docstrings.
02 April 2025 · 6 min to read
Python

How to Install and Set Up PyTorch

PyTorch is a free, open-source deep learning library. With its help, a computer can detect objects, classify images, generate text, and perform other complex tasks. PyTorch is also a rich tool ecosystem that supports and accelerates AI development and research. In this article, we will cover only the basics: we will learn how to install PyTorch and verify that it works. To work with PyTorch, you will need: At least 1 GB of RAM. Installed Python 3 and pip.  A configured local development environment. Deep knowledge of machine learning is not required for this tutorial. It is assumed that you are familiar with basic Python terms and concepts. Installing PyTorch We will be working in a Windows environment but using the command line. This makes the tutorial almost universal—you can use the same commands on Linux and macOS. First, create a workspace where you will work with Torch Python. Navigate to the directory where you want to place the new folder and create it: mkdir pytorch Inside the pytorch directory, create a new virtual environment. This is necessary to isolate projects and, if needed, use different library versions. python3 -m venv virtualpytorch To activate the virtual environment, first go to the newly created directory: cd virtualpytorch Inside, there is a scripts folder (on Windows) or bin (on other OS). Navigate to it: cd scripts Activate the virtual environment using a bat file by running the following command in the terminal: activate.bat The workspace is now ready. The next step is to install the PyTorch library. The easiest way to find the installation command is to check the official website. There is a convenient form where you select the required parameters. As an example, install the stable version for Windows using CPU via pip. Select these parameters in the form, and you will get the necessary command: pip3 install torch torchvision torchaudio Copy and execute the pip install torch command in the Windows command line. You are also installing two sub-libraries: torchvision – contains popular datasets, model architectures, and image transformations for computer vision. torchaudio – a library for processing audio and signals using PyTorch, providing input/output functions, signal processing, datasets, model implementations, and application components. This is the standard setup often used when first exploring the library. The method described above is not the only way to install PyTorch. If Anaconda is installed on Windows, you can use its graphical interface. If your computer has NVIDIA GPUs, you can select the CUDA version instead of CPU. In that case, the installation command will be different. All possible local installation methods are listed in the official documentation. You can also find commands for installing older versions of the library there. To install them, just select the required version and install it the same way as the current package builds. You don't need to write a script to check if the library is working. The Python interpreter has enough capabilities to perform basic operations. If you have successfully installed PyTorch in the previous steps, then launching the Python interpreter won’t be an issue. Run the following command in the command line: python Then enter the following code: import torch x = torch.rand(5, 3) print(x) You should see an output similar to this: tensor([[0.0925, 0.3696, 0.4949], [0.0240, 0.2642, 0.1545], [0.7274, 0.4975, 0.0753], [0.4438, 0.9685, 0.5022], [0.4757, 0.6715, 0.4298]]) Now, you can move on to solving more complex tasks. PyTorch Usage Example To make learning basic concepts more engaging, let’s do it in practice. For example, let’s create a neural network using PyTorch that can recognize the digit shown in an image. Prerequisites To create a neural network, we need to import eight modules: import torch import torchvision import torch.nn.functional as F import matplotlib.pyplot as plt import torch.nn as nn import torch.optim as optim from torchvision import transforms, datasets All of these are standard PyTorch libraries plus Matplotlib. They handle image processing, optimization, neural network construction, and graph visualization. Loading and Transforming Data We will train the neural network on the MNIST dataset, which contains 70,000 images of handwritten digits. 60,000 images will be used for training. 10,000 images will be used for testing. Each image is 28 × 28 pixels. Each image has a label representing the digit (e.g., 1, 2, 5, etc.). train = datasets.MNIST("", train=True, download=True, transform=transforms.Compose([transforms.ToTensor()])) test = datasets.MNIST("", train=False, download=True, transform=transforms.Compose([transforms.ToTensor()])) trainset = torch.utils.data.DataLoader(train, batch_size=15, shuffle=True) testset = torch.utils.data.DataLoader(test, batch_size=15, shuffle=True) First, we divide the data into training and testing sets by setting train=True/False. The test set must contain data that the machine has not seen before. Otherwise, the neural network’s performance would be biased. Setting shuffle=True helps reduce bias and overfitting. Imagine that the dataset contains many consecutive "1"s. If the machine gets too good at recognizing only the digit 1, it might struggle to recognize other numbers. Shuffling the data prevents the model from overfitting specific patterns and ensures a more generalized learning process. Definition and Initialization of the Neural Network The next step is defining the neural network: class NeuralNetwork(nn.Module): def __init__(self): super().__init__() self.fc1 = nn.Linear(784, 86) self.fc2 = nn.Linear(86, 86) self.fc3 = nn.Linear(86, 86) self.fc4 = nn.Linear(86, 10) def forward(self, x): x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = F.relu(self.fc3(x)) x = self.fc4(x) return F.log_softmax(x, dim=1) model = NeuralNetwork() The neural network consists of four layers: one input layer, two hidden layers, and one output layer. The Linear type represents a simple neural network. For each layer, it is necessary to specify the number of inputs and outputs. The output number of one layer becomes the input for the next layer. The input layer has 784 nodes. This is the result of multiplying 28 × 28 (the image size in pixels). The first hidden layer has 86 output nodes, so the input to the next layer must be 86 as well.The same logic applies further. 86 is an arbitrary number—you can use a different value. The output layer contains 10 nodes because the images represent digits from 0 to 9. Each time data passes through a layer, it is processed by an activation function. There are several activation functions. In this example, we use ReLU (Rectified Linear Unit). This function returns 0 if the value is negative or the value itself if it is positive. The softmax function is used at the output layer to normalize values. For example, it might return an 80% probability that the digit in the image is 1, or a 30% probability that the digit is 5, and so on. The highest probability is selected as the final prediction. Training The next step is training. optimizer = optim.Adam(model.parameters(), lr=0.001) EPOCHS = 3 for epoch in range(EPOCHS): for data in trainset: X, y = data model.zero_grad() output = model(X.view(-1, 28 * 28)) loss = F.nll_loss(output, y) loss.backward() optimizer.step() print(loss) The optimizer calculates the difference (loss) between the actual data and the prediction, adjusts the weights, recalculates the loss, and continues the cycle until the loss is minimized. Training Verification Here, we compare the actual values with the predictions made by the model. For this tutorial, the accuracy is high because the neural network effectively recognizes each digit. correct = 0 total = 0 with torch.no_grad(): for data in testset: data_input, target = data output = model(data_input.view(-1, 784)) for idx, i in enumerate(output): if torch.argmax(i) == target[idx]: correct += 1 total += 1 print('Accuracy: %d %%' % (100 * correct / total)) To verify that the neural network works, pass it an image of a digit from the test set: plt.imshow(X[1].view(28,28)) plt.show() print(torch.argmax(model(X[1].view(-1, 784))[0])) The output should display the digit shown in the provided image. Final Script Here’s the full script you can run to see how the neural network works: import torch import torchvision import torch.nn.functional as F import matplotlib.pyplot as plt import torch.nn as nn import torch.optim as optim from torchvision import transforms, datasets train = datasets.MNIST("", train=True, download=True, transform = transforms.Compose([transforms.ToTensor()])) test = datasets.MNIST("", train=False, download=True, transform = transforms.Compose([transforms.ToTensor()])) trainset = torch.utils.data.DataLoader(train, batch_size=15, shuffle=True) testset = torch.utils.data.DataLoader(test, batch_size=15, shuffle=True) class NeuralNetwork(nn.Module): def __init__(self): super().__init__() self.fc1 = nn.Linear(784, 86) self.fc2 = nn.Linear(86, 86) self.fc3 = nn.Linear(86, 86) self.fc4 = nn.Linear(86, 10) def forward(self, x): x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = F.relu(self.fc3(x)) x = self.fc4(x) return F.log_softmax(x, dim=1) model = NeuralNetwork() optimizer = optim.Adam(model.parameters(), lr=0.001) EPOCHS = 3 for epoch in range(EPOCHS): for data in trainset: X, y = data model.zero_grad() output = model(X.view(-1, 28 * 28)) loss = F.nll_loss(output, y) loss.backward() optimizer.step() print(loss) correct = 0 total = 0 with torch.no_grad(): for data in testset: data_input, target = data output = model(data_input.view(-1, 784)) for idx, i in enumerate(output): if torch.argmax(i) == target[idx]: correct += 1 total += 1 print('Accuracy: %d %%' % (100 * correct / total)) plt.imshow(X[1].view(28,28)) plt.show() print(torch.argmax(model(X[1].view(-1, 784))[0])) Each time we run the network, it will take a random image from the test set and analyze the digit depicted on it. After the process is completed, it will display the recognition accuracy in percentage, the image itself, and the digit recognized by the neural network. This is how it looks: Conclusion PyTorch is a powerful open-source machine learning platform that accelerates the transition from research prototypes to production deployments. With it, you can solve various tasks in the fields of artificial intelligence and neural networks. You don’t need deep knowledge of machine learning to begin working with PyTorch. It is enough to know the basic concepts to repeat and even modify popular procedures like image recognition to suit your needs. A big advantage of PyTorch is the large user community that writes tutorials and shares examples of using the library. Object recognition in images is one of the simplest and most popular tasks in PyTorch for beginners. However, the capabilities of the library are not limited to this. To create powerful neural networks, you need a lot of training data. These can be stored, for example, in an object-based S3 storage such as Hostman, with instant data access via API or web interface. This is an excellent solution for storing large volumes of information.
01 April 2025 · 10 min to read

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support