Sign In
Sign In

How to Read Excel Files in Python using Pandas

How to Read Excel Files in Python using Pandas
Kolawole Mangabo
Technical writer
Python
03.10.2024
Reading time: 8 min

Excel files are commonly used to organize, sort, and analyze data in a tabular format with rows and columns. They are widely applied in industries like data analysis, finance, and reporting.

Using Python, the pandas library allows for efficient manipulation of Excel files, enabling operations like reading and writing data. This article will cover how to use the read_excel function from pandas to read Excel files.

Installing Pandas

To begin, install pandas by running the following command:

pip install pandas

This will install pandas along with the required dependencies in your work environment. Additionally, the openpyxl module is needed for reading .xlsx files.

Why OpenPyXL?

Excel files come in different formats and extensions. To ensure compatibility when working with these files, pandas allows you to specify the engine you want to use. Below is a list of supported engines for reading Excel files:

  • OpenPyXL: Used for reading and writing .xlsx files (Excel 2007+).
  • XlsxWriter: Primarily used for writing .xlsx files.
  • xlrd: Used for reading older .xls files (Excel 97-2003).
  • Pyxlsb: Used for reading .xlsb (binary Excel format) files.

OpenPyXL also supports Excel-specific features, such as formatting and formulas. OpenPyXL is already installed as a dependency of pandas, but you can install it using the following command:

pip install openpyxl

While OpenPyXL can be used on its own to read Excel files, it is also integrated as an engine within pandas for reading and writing .xlsx files.

We will work with an Excel file that you can download here. Download the file and move it into your working environment.

Basic Usage of read_excel Function

The Excel file we are working with has the following structure:

Image1

It also has three worksheets: Orders, Returns, and Users.

To read this file, the read_excel function from pandas will be used.

The read_excel function in pandas is used to import data from Excel files into a pandas DataFrame, a powerful structure for analyzing and manipulating data. This function is highly versatile, allowing users to read data from specific sheets, columns, or ranges.

Here is how to use this function while specifying the engine:

import pandas as pd 

df = pd.read_excel('SuperStoreUS-2015.xlsx')

print(df)

This code imports the pandas library and uses the read_excel function to read the SuperStoreUS-2015.xlsx Excel file into a pandas DataFrame. The print(df) statement outputs the DataFrame contents, displaying the data from the Excel file. Below is the resulting output:

       Row ID Order Priority  Discount  Unit Price  Shipping Cost  ...  Ship Date     Profit Quantity ordered new    Sales Order ID

0      20847           High      0.01        2.84           0.93  ... 2015-01-08     4.5600                    4    13.01    88522

1      20228  Not Specified      0.02      500.98          26.00  ... 2015-06-15  4390.3665                   12  6362.85    90193

2      21776       Critical      0.06        9.48           7.29  ... 2015-02-17   -53.8096                   22   211.15    90192

3      24844         Medium      0.09       78.69          19.99  ... 2015-05-14   803.4705                   16  1164.45    86838

4      24846         Medium      0.08        3.28           2.31  ... 2015-05-13   -24.0300                    7    22.23    86838

The read_excel function is highly flexible and can be adapted to various usage scenarios. Next, we will explore how to use it for reading specific sheets and columns.

Reading Specific Sheets and Columns

Excel files can come with multiple sheets and as many columns as possible. The read_excel function takes the sheet_name argument to tell pandas which sheet to read. By default, read_excel will load all worksheets. Here is how you can use the sheet_name argument:

df = pd.read_excel('SuperStoreUS-2015.xlsx', sheet_name="Returns")

print(df)

This will read the Returns sheet, and here is an example output:

      Order ID    Status

0           65  Returned

1          612  Returned

2          614  Returned

3          678  Returned

4          710  Returned

...        ...       ...

1629    182681  Returned

1630    182683  Returned

1631    182750  Returned

1632    182781  Returned

1633    182906  Returned

[1634 rows x 2 columns]

The sheet_name argument also takes integers that are used in zero-indexed sheet positions. For instance, using pd.read_excel('SuperStoreUS-2015.xlsx', sheet_name=1) will load the Returns sheet as well.

You can also choose to read specific columns from the Excel file. The read_excel function allows for selective column reading using the usecols parameter. It accepts various formats:

  • A string for Excel column letters or ranges (e.g., "A:C").
  • A list of integers for column positions.
  • A list of column names.

Here is an example using column names:

import pandas as pd

df = pd.read_excel('SuperStoreUS-2015.xlsx', usecols=['Row ID', 'Sales'])

print(df)

In this case, the usecols parameter specifies that only columns Row ID and Sales from the Excel file should be imported into the DataFrame. The code below does the same thing, but using Excel column letters:

import pandas as pd

df = pd.read_excel('SuperStoreUS-2015.xlsx', usecols='A,X')

print(df)

Here is the output:

      Row ID    Sales

0      20847    13.01

1      20228  6362.85

2      21776   211.15

3      24844  1164.45

4      24846    22.23

...      ...      ...

1947   19842   207.31

1948   19843   143.12

1949   26208    59.98

1950   24911   135.78

1951   25914   506.50

You can also use range selection to read columns by their position. In the code below, we are reading from Order Priority to Customer ID.

df = pd.read_excel('SuperStoreUS-2015.xlsx', usecols='B:F')

Here is an example output when reading columns B to F:

     Order Priority  Discount  Unit Price  Shipping Cost  Customer ID

0              High      0.01        2.84           0.93            3

1     Not Specified      0.02      500.98          26.00            5

2          Critical      0.06        9.48           7.29           11

3            Medium      0.09       78.69          19.99           14

4            Medium      0.08        3.28           2.31           14

Additionally, you can provide a callable that evaluates column names, reading only those for which the function returns True.

Handling Missing Data in Excel Files

In Excel files, missing data refers to values that are absent, often represented by empty cells. When reading an Excel file into a pandas DataFrame, missing data is automatically identified and handled as NaN (Not a Number), which is pandas placeholder for missing values.

Pandas offers several methods to handle missing data, such as:

  • dropna(): Removes rows or columns with missing values.
  • fillna(): Replaces missing values with a specified value (e.g., 0 or the mean of the column).
  • isna(): Detects missing values and returns a boolean DataFrame.

For example, using fillna on our Excel file will replace all missing values with 0:

df = pd.read_excel('SuperStoreUS-2015.xlsx')

df_cleaned = df.fillna(0)

Handling missing data is essential to ensure accurate analysis and prevent errors or biases in data-driven decisions.

Reading and Analyzing an Excel File in Pandas

Let’s make a pragmatic use of the notion we have learned. In this practical example, we will walk through reading an Excel file, performing some basic analysis, and exporting the manipulated data into various formats. 

Specifically, we’ll calculate the sum, maximum, and minimum values for the Profit column for the year 2015, and export the results to CSV, JSON, and a Python dictionary.

Step 1: Loading the Excel File

The first step is to load the Excel file using the read_excel function from pandas:

import pandas as pd

df = pd.read_excel('SuperStoreUS-2015.xlsx', usecols=['Ship Date', 'Profit'])

print(df.head())

This code reads the SuperStoreUS-2015.xlsx file into a pandas DataFrame and displays the first few rows, including the Ship Date and Profit columns.

Step 2: Calculating Profit for June 2015

Next, we will filter the data to include only records from June 2015 and calculate the total, maximum, and minimum profit for that month. Since the date format in the dataset is MM/DD/YYYY, we will convert the Ship Date column to a datetime format and filter by the specific month:

df['Ship Date'] = pd.to_datetime(df['Ship Date'], format='%m/%d/%Y')

df_june_2015 = df[(df['Ship Date'].dt.year == 2015) & (df['Ship Date'].dt.month == 6)]

# Calculate the sum, max, and min for the Profit column

profit_sum = df_june_2015['Profit'].sum()

profit_max = df_june_2015['Profit'].max()

profit_min = df_june_2015['Profit'].min()

print(f"Total Profit in June 2015: {profit_sum}")

print(f"Maximum Profit in June 2015: {profit_max}")

print(f"Minimum Profit in June 2015: {profit_min}")

The output will be something like:

print(f"Total Profit in June 2015: {round(profit_sum, ndigits=2)}")

print(f"Maximum Profit in June 2015: {round(profit_max, ndigits=2)}")

print(f"Minimum Profit in June 2015: {round(profit_min, ndigits=2)}")

Step 3: Exporting the Manipulated Data

Once the profit for June 2015 has been calculated, we can export the filtered data to different formats, including CSV, JSON, and a Python dictionary.

# Export to CSV

df_june_2015.to_csv('SuperStoreUS_June2015_Profit.csv', index=False)

# Export to JSON

df_june_2015.to_json('SuperStoreUS_June2015_Profit.json', orient='records')

# Convert to Dictionary

data_dict = df_june_2015.to_dict(orient='records')

print(data_dict[:5])

In this step, the data is first exported to a CSV file and then to a JSON file. Finally, the DataFrame is converted into a Python dictionary, with each row represented as a dictionary.

Conclusion

In this article, we have learned how to use the read_excel function from pandas to read and manipulate Excel files. This is a powerful function with the ability to simplify data filtering for a better focus on the rows or columns we want.

Check out our app platform to find Python applications, such as Celery, Django, FastAPI and Flask. 

Python
03.10.2024
Reading time: 8 min

Similar

Python

Command-Line Option and Argument Parsing using argparse in Python

Command-line interfaces (CLIs) are one of the quickest and most effective means of interacting with software. They enable you to provide commands directly which leads to quicker execution and enhanced features. Developers often build CLIs using Python for several applications, utilities, and automation scripts, ensuring they can dynamically process user input. This is where the Python argparse module steps in. The argparse Python module streamlines the process of managing command-line inputs, enabling developers to create interactive and user-friendly utilities. As part of the standard library, it allows programmers to define, process, and validate inputs seamlessly without the need for complex logic. This article will discuss some of the most important concepts, useful examples, and advanced features of the argparse module so that you can start building solid command-line tools right away. How to Use Python argparse for Command-Line Interfaces This is how to use argparse in your Python script: Step 1: Import Module First import the module into your Python parser script: import argparse This inclusion enables parsing .py arg inputs from the command line. Step 2: Create an ArgumentParser Object The ArgumentParser class is the most minimal class of the Python argumentparser module's API. To use it, begin by creating an instance of the class: parser = argparse.ArgumentParser(description="A Hostman tutorial on Python argparse.") Here: description describes what the program does and will be displayed when someone runs --help. Step 3: Add Inputs and Options Define the parameters and features your program accepts via add_argument() function: parser.add_argument('filename', type=str, help="Name of the file to process") parser.add_argument('--verbose', action='store_true', help="Enable verbose mode") Here: filename is a mandatory option. --verbose is optional, to allow you to set the flag to make it verbose. Step 4: Parse User Inputs Process the user-provided inputs by invoking the parse_args() Python method: args = parser.parse_args() This stores the command-line values as attributes of the args object for further use in your Python script.  Step 5: Access Processed Data Access the inputs and options for further use in your program: For example: print(f"File to process: {args.filename}") if args.verbose:     print("Verbose mode enabled") else:     print("Verbose mode disabled") Example CLI Usage Here are some scenarios to run this script: File Processing Without Verbose Mode python3 file.py example.txt File Processing With Verbose Mode python3 file.py example.txt --verbose Display Help If you need to see what arguments the script accepts or their description, use the --help argument: python3 file.py --help Common Examples of argparse Usage Let's explore a few practical examples of the module. Example 1: Adding Default Values Sometimes, optional inputs in command-line interfaces need predefined values for smoother execution. With this module, you can set a default value that applies when someone doesn’t provide input. This script sets a default timeout of 30 seconds if you don’t specify the --timeout parameter. import argparse # Create the argument parser parser = argparse.ArgumentParser(description="Demonstrating default argument values.") # Pass an optional argument with a default value parser.add_argument('--timeout', type=int, default=30, help="Timeout in seconds (default: 30)") # Interpret the arguments args = parser.parse_args() # Retrieve and print the timeout value print(f"Timeout value: {args.timeout} seconds") Explanation Importing Module: Importing the argparse module. Creating the ArgumentParser Instance: An ArgumentParser object is created with a description so that a short description of the program purpose is provided. This description is displayed when the user runs the program via the --help option. Including --timeout: The --timeout option is not obligatory (indicated by the -- prefix). The type=int makes the argument for --timeout an integer. The default=30 is provided so that in case the user does not enter a value, then the timeout would be 30 seconds. The help parameter adds a description to the argument, and it will also appear in the help documentation. Parsing Process: The parse_args() function processes user inputs and makes them accessible as attributes of the args object. In our example, we access args.timeout and print out its value. Case 1: Default Value Used If the --timeout option is not specified, the default value of 30 seconds is used: python file.py Case 2: Custom Value Provided For a custom value for --timeout (e.g., 60 seconds), apply: python file.py --timeout 60 Example 2: Utilizing Choices The argparse choices parameter allows you to restrict an argument to a set of beforehand known valid values. This is useful if your program features some specific modes, options, or settings to check. Here, we will specify a --mode option with two default values: basic and advanced. import argparse # Creating argument parser parser = argparse.ArgumentParser(description="Demonstrating the use of choices in argparse.") # Adding the --mode argument with predefined choices parser.add_argument('--mode', choices=['basic', 'advanced'], help="Choose the mode of operation") # Parse the arguments args = parser.parse_args() # Access and display the selected mode if args.mode: print(f"Mode selected: {args.mode}") else: print("No mode selected. Please choose 'basic' or 'advanced'.") Adding --mode: The choices argument indicates that valid options for the --mode are basic and advanced. The application will fail when the user supplies an input other than in choices. Help Text: The help parameter gives valuable information when the --help command is executed. Case 1: Valid Input To specify a valid value for --mode, utilize: python3 file.py --mode basic Case 2: No Input Provided For running the program without specifying a mode: python3 file.py Case 3: Invalid Input If a value is provided that is not in the predefined choices: python3 file.py --mode intermediate Example 3: Handling Multiple Values The nargs option causes an argument to accept more than one input. This is useful whenever your program requires a list of values for processing, i.e., numbers, filenames, or options. Here we will show how to use nargs='+' to accept a --numbers option that can take multiple integers. import argparse # Create an ArgumentParser object parser = argparse.ArgumentParser(description="Demonstrating how to handle multiple values using argparse.") # Add the --numbers argument with nargs='+' parser.add_argument('--numbers', nargs='+', type=int, help="List of numbers to process") # Parse the arguments args = parser.parse_args() # Access and display the numbers if args.numbers: print(f"Numbers provided: {args.numbers}") print(f"Sum of numbers: {sum(args.numbers)}") else: print("No numbers provided. Please use --numbers followed by a list of integers.") Adding the --numbers Option: The user can provide a list of values as arguments for --numbers. type=int interprets the input as an integer. If a non-integer input is provided, the program raises an exception. The help parameter gives the information.  Parsing Phase: After parsing the arguments, the input to --numbers is stored in the form of a list in args.numbers. Utilizing the Input: You just need to iterate over the list, calculate statistics (e.g., sum, mean), or any other calculation on the input. Case 1: Providing Multiple Numbers To specify multiple integers for the --numbers parameter, execute: python3 file.py --numbers 10 20 30 Case 2: Providing a Single Number If just one integer is specified, run: python3 file.py --numbers 5 Case 3: No Input Provided If the script is run without --numbers: python3 file.py Case 4: Invalid Input In case of inputting a non-integer value: python3 file.py --numbers 10 abc 20 Example 4: Required Optional Arguments Optional arguments (those that begin with the --) are not mandatory by default. But there are times when you would like them to be mandatory for your script to work properly. You can achieve this by passing the required=True parameter when defining the argument. In this script, --config specifies a path to a configuration file. By leveraging required=True, the script enforces that a value for --config must be provided. If omitted, the program will throw an error. import argparse # Create an ArgumentParser object parser = argparse.ArgumentParser(description="Demonstrating required optional arguments in argparse.") # Add the --config argument parser.add_argument('--config', required=True, help="Path to the configuration file") # Parse the arguments args = parser.parse_args() # Access and display the provided configuration file path print(f"Configuration file path: {args.config}") Adding the --config Option: --config is considered optional since it starts with --. However, thanks to the required=True parameter, users must include it when they run the script. The help parameter clarifies what this parameter does, and you'll see this information in the help message when you use --help. Parsing: The parse_args() method takes care of processing the arguments. If someone forgets to include --config, the program will stop and show a clear error message. Accessing the Input: The value you provide for --config gets stored in args.config. You can then use this in your script to work with the configuration file. Case 1: Valid Input For providing a valid path to the configuration file, use: python3 file.py --config settings.json Case 2: Missing the Required Argument For running the script without specifying --config, apply: python3 file.py Advanced Features  While argparse excels at handling basic command-line arguments, it also provides advanced features that enhance the functionality and usability of your CLIs. These features ensure your scripts are scalable, readable, and easy to maintain. Below are some advanced capabilities you can leverage. Handling Boolean Flags Boolean flags allow toggling features (on/off) without requiring user input. Use the action='store_true' or action='store_false' parameters to implement these flags. parser.add_argument('--debug', action='store_true', help="Enable debugging mode") Including --debug enables debugging mode, useful for many Python argparse examples. Grouping Related Arguments Use add_argument_group() to organize related arguments, improving readability in complex CLIs. group = parser.add_argument_group('File Operations') group.add_argument('--input', type=str, help="Input file") group.add_argument('--output', type=str, help="Output file") Grouped arguments appear under their own section in the --help documentation. Mutually Exclusive Arguments To ensure users select only one of several conflicting options, use the add_mutually_exclusive_group() method. group = parser.add_mutually_exclusive_group() group.add_argument('--json', action='store_true', help="Output in JSON format") group.add_argument('--xml', action='store_true', help="Output in XML format") This ensures one can choose either JSON or XML, but not both. Conclusion The argparse Python module simplifies creating reliable CLIs for handling Python program command line arguments. From the most basic option of just providing an input to more complex ones like setting choices and nargs, developers can build user-friendly and robust CLIs. Following the best practices of giving proper names to arguments and writing good docstrings would help you in making your scripts user-friendly and easier to maintain.
21 July 2025 · 10 min to read
Python

How to Get the Length of a List in Python

Lists in Python are used almost everywhere. In this tutorial we will look at four ways to find the length of a Python list: by using built‑in functions, recursion, and a loop. Knowing the length of a list is most often required to iterate through it and perform various operations on it. len() function len() is a built‑in Python function for finding the length of a list. It takes one argument—the list itself—and returns an integer equal to the list’s length. The same function also works with other iterable objects, such as strings. Country_list = ["The United States of America", "Cyprus", "Netherlands", "Germany"] count = len(Country_list) print("There are", count, "countries") Output: There are 4 countries Finding the Length of a List with a Loop You can determine a list’s length in Python with a for loop. The idea is to traverse the entire list while incrementing a counter by  1 on each iteration. Let’s wrap this in a separate function: def list_length(list): counter = 0 for i in list: counter = counter + 1 return counter Country_list = ["The United States of America", "Cyprus", "Netherlands", "Germany", "Japan"] count = list_length(Country_list) print("There are", count, "countries") Output: There are 5 countries Finding the Length of a List with Recursion The same task can be solved with recursion: def list_length_recursive(list): if not list: return 0 return 1 + list_length_recursive(list[1:]) Country_list = ["The United States of America", "Cyprus", "Netherlands","Germany", "Japan", "Poland"] count = list_length_recursive(Country_list) print("There are", count, "countries") Output: There are 6 countries How it works. The function list_length_recursive() receives a list as input. If the list is empty, it returns 0—the length of an empty list. Otherwise it calls itself recursively with the argument list[1:], a slice of the original list starting from index 1 (i.e., the list without the element at index 0). The result of that call is added to 1. With each recursive step the returned value grows by one while the list shrinks by one element. length_hint() function The length_hint() function lives in the operator module. That module contains functions analogous to Python’s internal operators: addition, subtraction, comparison, and so on. length_hint() returns the length of iterable objects such as strings, tuples, dictionaries, and lists. It works similarly to len(): from operator import length_hint Country_list = ["The United States of America", "Cyprus", "Netherlands","Germany", "Japan", "Poland", "Sweden"] count = length_hint(Country_list) print("There are", count, "countries") Output: There are 7 countries Note that length_hint() must be imported before use. Conclusion In this guide we covered four ways to determine the length of a list in Python. Under equal conditions the most efficient method is len(). The other approaches are justified mainly when you are implementing custom classes similar to list.
17 July 2025 · 3 min to read
Python

Understanding the main() Function in Python

In any complex program, it’s crucial to organize the code properly: define a starting point and separate its logical components. In Python, modules can be executed on their own or imported into other modules, so a well‑designed program must detect the execution context and adjust its behavior accordingly.  Separating run‑time code from import‑time code prevents premature execution, and having a single entry point makes it easier to configure launch parameters, pass command‑line arguments, and set up tests. When all important logic is gathered in one place, adding automated tests and rolling out new features becomes much more convenient.  For exactly these reasons it is common in Python to create a dedicated function that is called only when the script is run directly. Thanks to it, the code stays clean, modular, and controllable. That function, usually named main(), is the focus of this article. All examples were executed with Python 3.10.12 on a Hostman cloud server running Ubuntu 22.04. Each script was placed in a separate .py file (e.g., script.py) and started with: python script.py The scripts are written so they can be run just as easily in any online Python compiler for quick demonstrations. What Is the main() Function in Python The simplest Python code might look like: print("Hello, world!")  # direct execution Or a script might execute statements in sequence at file level: print("Hello, world!")       # action #1 print("How are you, world?") # action #2 print("Good‑bye, world...")  # action #3 That trivial arrangement works only for the simplest scripts. As a program grows, the logic quickly becomes tangled and demands re‑organization: # function containing the program’s main logic (entry point) def main():     print("Hello, world!") # launch the main logic if __name__ == "__main__":     main()                    # call the function with the main logic With more actions the code might look like: def main(): print("Hello, world!") print("How are you, world?") print("Good‑bye, world...") if __name__ == "__main__": main() This implementation has several important aspects, discussed below. The main() Function The core program logic lives inside a separate function. Although the name can be anything, developers usually choose main, mirroring C, C++, Java, and other languages.  Both helper code and the main logic are encapsulated: nothing sits “naked” at file scope. # greeting helper def greet(name): print(f"Hello, {name}!") # program logic def main(): name = input("Enter your name: ") greet(name) # launch the program if __name__ == "__main__": main() Thus main() acts as the entry point just as in many other languages. The if __name__ == "__main__" Check Before calling main() comes the somewhat odd construct if __name__ == "__main__":.  Its purpose is to split running from importing logic: If the script runs directly, the code inside the if block executes. If the script is imported, the block is skipped. Inside that block, you can put any code—not only the main() call: if __name__ == "__main__":     print("Any code can live here, not only main()") __name__ is one of Python’s built‑in “dunder” (double‑underscore) variables, often called magic or special. All dunder objects are defined and used internally by Python, but regular users can read them too. Depending on the context, __name__ holds: "__main__" when the module runs as a standalone script. The module’s own name when it is imported elsewhere. This lets a module discover its execution context. Advantages of Using  main() Organization Helper functions and classes, as well as the main function, are wrapped separately, making them easy to find and read. Global code is minimal—only initialization stays at file scope: def process_data(data): return [d * 2 for d in data] def main(): raw = [1, 2, 3, 4] result = process_data(raw) print("Result:", result) if __name__ == "__main__": main() A consistent style means no data manipulation happens at the file level. Even in a large script you can quickly locate the start of execution and any auxiliary sections. Isolation When code is written directly at the module level, every temporary variable, file handle, or connection lives in the global namespace, which can be painful for debugging and testing. Importing such a module pollutes the importer’s globals: # executes immediately on import values = [2, 4, 6] doubles = [] for v in values: doubles.append(v * 2) print("Doubled values:", doubles) With main() everything is local; when the function returns, its variables vanish: def double_list(items): return [x * 2 for x in items] # create a new list with doubled elements def main(): values = [2, 4, 6] result = double_list(values) print("Doubled values:", result) if __name__ == "__main__": main() That’s invaluable for unit testing, where you might run specific functions (including  main()) without triggering the whole program. Safety Without the __name__ check, top‑level code runs even on import—usually undesirable and potentially harmful. some.py: print("This code will execute even on import!") def useful_function(): return 42 main.py: import some print("The logic of the imported module executed itself...") Console: This code will execute even on import! The logic of the imported module executed itself... The safer some.py: def useful_function():     return 42 def main():     print("This code will not run on import") main() plus the __name__ check guard against accidental execution. Inside main() you can also verify user permissions or environment variables. How to Write main() in Python Remember: main() is not a language construct, just a regular function promoted to “entry point.” To ensure it runs only when the script starts directly: Tools – define helper functions with business logic. Logic – assemble them inside main() in the desired order. Check – add the if __name__ == "__main__" guard.  This template yields structured, import‑safe, test‑friendly code—excellent practice for any sizable Python project. Example Python Program Using main() # import the standard counter from collections import Counter # runs no matter how the program starts print("The text‑analysis program is active") # text‑analysis helper def analyze_text(text): words = text.split() # split text into words total = len(words) # total word count unique = len(set(words)) # unique word count avg_len = sum(len(w) for w in words) / total if total else 0 freq = Counter(words) # build frequency counter top3 = freq.most_common(3) # top three words return { 'total': total, 'unique': unique, 'avg_len': avg_len, 'top3': top3 } # program’s main logic def main(): print("Enter text (multiple lines). Press Enter on an empty line to finish:") lines = [] while True: line = input() if not line: break lines.append(line) text = ' '.join(lines) stats = analyze_text(text) print(f"\nTotal number of words: {stats['total']}") print(f"Unique words: {stats['unique']}") print(f"Average word length: {stats['avg_len']:.2f}") print("Top‑3 most frequent words:") for word, count in stats['top3']: print(f" {word!r}: {count} time(s)") # launch program if __name__ == "__main__": main() Running the script prints a prompt: Enter text (multiple lines). Press Enter on an empty line to finish: Input first line: Star cruiser Orion glided silently through the darkness of intergalactic space. Second line: Signals of unknown life‑forms flashed on the onboard sensors where the nebula glowed with a phosphorescent light. Third line: The cruiser checked the sensors, then the cruiser activated the defense system, and the cruiser returned to its course. Console output: The text‑analysis program is active Total number of words: 47 Unique words: 37 Average word length: 5.68 Top‑3 most frequent words: 'the': 7 time(s) 'cruiser': 4 time(s) 'of': 2 time(s) If you import this program (file program.py) elsewhere: import program         # importing program.py Only the code outside main() runs: The text‑analysis program is active So, a moderately complex text‑analysis utility achieves clear logic separation and context detection. When to Use main() and When Not To Use  main() (almost always appropriate) when: Medium/large scripts – significant code with non‑trivial logic, multiple functions/classes. Libraries or CLI utilities – you want parts of the module importable without side effects. Autotests – you need to test pure logic without extra boilerplate. You can skip main() when: Tiny one‑off scripts – trivial logic for a quick data tweak. Educational snippets – short examples illustrating a few syntax features. In short, if your Python program is a standalone utility or app with multiple processing stages, command‑line arguments, and external resources—introduce  main(). If it’s a small throw‑away script, omitting main() keeps things concise. Conclusion The  main() function in Python serves two critical purposes: Isolates the program’s core logic from the global namespace. Separates standalone‑execution logic from import logic. Thus, a Python file evolves from a straightforward script of sequential actions into a fully‑fledged program with an entry point, encapsulated logic, and the ability to detect its runtime environment.
14 July 2025 · 8 min to read

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support