How to Read Excel Files in Python using Pandas

How to Read Excel Files in Python using Pandas
Kolawole Mangabo
Technical writer
Python
03.10.2024
Reading time: 8 min

Excel files are commonly used to organize, sort, and analyze data in a tabular format with rows and columns. They are widely applied in industries like data analysis, finance, and reporting.

Using Python, the pandas library allows for efficient manipulation of Excel files, enabling operations like reading and writing data. This article will cover how to use the read_excel function from pandas to read Excel files.

Installing Pandas

To begin, install pandas by running the following command:

pip install pandas

This will install pandas along with the required dependencies in your work environment. Additionally, the openpyxl module is needed for reading .xlsx files.

Why OpenPyXL?

Excel files come in different formats and extensions. To ensure compatibility when working with these files, pandas allows you to specify the engine you want to use. Below is a list of supported engines for reading Excel files:

  • OpenPyXL: Used for reading and writing .xlsx files (Excel 2007+).
  • XlsxWriter: Primarily used for writing .xlsx files.
  • xlrd: Used for reading older .xls files (Excel 97-2003).
  • Pyxlsb: Used for reading .xlsb (binary Excel format) files.

OpenPyXL also supports Excel-specific features, such as formatting and formulas. OpenPyXL is already installed as a dependency of pandas, but you can install it using the following command:

pip install openpyxl

While OpenPyXL can be used on its own to read Excel files, it is also integrated as an engine within pandas for reading and writing .xlsx files.

We will work with an Excel file that you can download here. Download the file and move it into your working environment.

Basic Usage of read_excel Function

The Excel file we are working with has the following structure:

Image1

It also has three worksheets: Orders, Returns, and Users.

To read this file, the read_excel function from pandas will be used.

The read_excel function in pandas is used to import data from Excel files into a pandas DataFrame, a powerful structure for analyzing and manipulating data. This function is highly versatile, allowing users to read data from specific sheets, columns, or ranges.

Here is how to use this function while specifying the engine:

import pandas as pd 

df = pd.read_excel('SuperStoreUS-2015.xlsx')

print(df)

This code imports the pandas library and uses the read_excel function to read the SuperStoreUS-2015.xlsx Excel file into a pandas DataFrame. The print(df) statement outputs the DataFrame contents, displaying the data from the Excel file. Below is the resulting output:

       Row ID Order Priority  Discount  Unit Price  Shipping Cost  ...  Ship Date     Profit Quantity ordered new    Sales Order ID

0      20847           High      0.01        2.84           0.93  ... 2015-01-08     4.5600                    4    13.01    88522

1      20228  Not Specified      0.02      500.98          26.00  ... 2015-06-15  4390.3665                   12  6362.85    90193

2      21776       Critical      0.06        9.48           7.29  ... 2015-02-17   -53.8096                   22   211.15    90192

3      24844         Medium      0.09       78.69          19.99  ... 2015-05-14   803.4705                   16  1164.45    86838

4      24846         Medium      0.08        3.28           2.31  ... 2015-05-13   -24.0300                    7    22.23    86838

The read_excel function is highly flexible and can be adapted to various usage scenarios. Next, we will explore how to use it for reading specific sheets and columns.

Reading Specific Sheets and Columns

Excel files can come with multiple sheets and as many columns as possible. The read_excel function takes the sheet_name argument to tell pandas which sheet to read. By default, read_excel will load all worksheets. Here is how you can use the sheet_name argument:

df = pd.read_excel('SuperStoreUS-2015.xlsx', sheet_name="Returns")

print(df)

This will read the Returns sheet, and here is an example output:

      Order ID    Status

0           65  Returned

1          612  Returned

2          614  Returned

3          678  Returned

4          710  Returned

...        ...       ...

1629    182681  Returned

1630    182683  Returned

1631    182750  Returned

1632    182781  Returned

1633    182906  Returned

[1634 rows x 2 columns]

The sheet_name argument also takes integers that are used in zero-indexed sheet positions. For instance, using pd.read_excel('SuperStoreUS-2015.xlsx', sheet_name=1) will load the Returns sheet as well.

You can also choose to read specific columns from the Excel file. The read_excel function allows for selective column reading using the usecols parameter. It accepts various formats:

  • A string for Excel column letters or ranges (e.g., "A:C").
  • A list of integers for column positions.
  • A list of column names.

Here is an example using column names:

import pandas as pd

df = pd.read_excel('SuperStoreUS-2015.xlsx', usecols=['Row ID', 'Sales'])

print(df)

In this case, the usecols parameter specifies that only columns Row ID and Sales from the Excel file should be imported into the DataFrame. The code below does the same thing, but using Excel column letters:

import pandas as pd

df = pd.read_excel('SuperStoreUS-2015.xlsx', usecols='A,X')

print(df)

Here is the output:

      Row ID    Sales

0      20847    13.01

1      20228  6362.85

2      21776   211.15

3      24844  1164.45

4      24846    22.23

...      ...      ...

1947   19842   207.31

1948   19843   143.12

1949   26208    59.98

1950   24911   135.78

1951   25914   506.50

You can also use range selection to read columns by their position. In the code below, we are reading from Order Priority to Customer ID.

df = pd.read_excel('SuperStoreUS-2015.xlsx', usecols='B:F')

Here is an example output when reading columns B to F:

     Order Priority  Discount  Unit Price  Shipping Cost  Customer ID

0              High      0.01        2.84           0.93            3

1     Not Specified      0.02      500.98          26.00            5

2          Critical      0.06        9.48           7.29           11

3            Medium      0.09       78.69          19.99           14

4            Medium      0.08        3.28           2.31           14

Additionally, you can provide a callable that evaluates column names, reading only those for which the function returns True.

Handling Missing Data in Excel Files

In Excel files, missing data refers to values that are absent, often represented by empty cells. When reading an Excel file into a pandas DataFrame, missing data is automatically identified and handled as NaN (Not a Number), which is pandas placeholder for missing values.

Pandas offers several methods to handle missing data, such as:

  • dropna(): Removes rows or columns with missing values.
  • fillna(): Replaces missing values with a specified value (e.g., 0 or the mean of the column).
  • isna(): Detects missing values and returns a boolean DataFrame.

For example, using fillna on our Excel file will replace all missing values with 0:

df = pd.read_excel('SuperStoreUS-2015.xlsx')

df_cleaned = df.fillna(0)

Handling missing data is essential to ensure accurate analysis and prevent errors or biases in data-driven decisions.

Reading and Analyzing an Excel File in Pandas

Let’s make a pragmatic use of the notion we have learned. In this practical example, we will walk through reading an Excel file, performing some basic analysis, and exporting the manipulated data into various formats. 

Specifically, we’ll calculate the sum, maximum, and minimum values for the Profit column for the year 2015, and export the results to CSV, JSON, and a Python dictionary.

Step 1: Loading the Excel File

The first step is to load the Excel file using the read_excel function from pandas:

import pandas as pd

df = pd.read_excel('SuperStoreUS-2015.xlsx', usecols=['Ship Date', 'Profit'])

print(df.head())

This code reads the SuperStoreUS-2015.xlsx file into a pandas DataFrame and displays the first few rows, including the Ship Date and Profit columns.

Step 2: Calculating Profit for June 2015

Next, we will filter the data to include only records from June 2015 and calculate the total, maximum, and minimum profit for that month. Since the date format in the dataset is MM/DD/YYYY, we will convert the Ship Date column to a datetime format and filter by the specific month:

df['Ship Date'] = pd.to_datetime(df['Ship Date'], format='%m/%d/%Y')

df_june_2015 = df[(df['Ship Date'].dt.year == 2015) & (df['Ship Date'].dt.month == 6)]

# Calculate the sum, max, and min for the Profit column

profit_sum = df_june_2015['Profit'].sum()

profit_max = df_june_2015['Profit'].max()

profit_min = df_june_2015['Profit'].min()

print(f"Total Profit in June 2015: {profit_sum}")

print(f"Maximum Profit in June 2015: {profit_max}")

print(f"Minimum Profit in June 2015: {profit_min}")

The output will be something like:

print(f"Total Profit in June 2015: {round(profit_sum, ndigits=2)}")

print(f"Maximum Profit in June 2015: {round(profit_max, ndigits=2)}")

print(f"Minimum Profit in June 2015: {round(profit_min, ndigits=2)}")

Step 3: Exporting the Manipulated Data

Once the profit for June 2015 has been calculated, we can export the filtered data to different formats, including CSV, JSON, and a Python dictionary.

# Export to CSV

df_june_2015.to_csv('SuperStoreUS_June2015_Profit.csv', index=False)

# Export to JSON

df_june_2015.to_json('SuperStoreUS_June2015_Profit.json', orient='records')

# Convert to Dictionary

data_dict = df_june_2015.to_dict(orient='records')

print(data_dict[:5])

In this step, the data is first exported to a CSV file and then to a JSON file. Finally, the DataFrame is converted into a Python dictionary, with each row represented as a dictionary.

Conclusion

In this article, we have learned how to use the read_excel function from pandas to read and manipulate Excel files. This is a powerful function with the ability to simplify data filtering for a better focus on the rows or columns we want.

Check out our app platform to find Python applications, such as Celery, Django, FastAPI and Flask. 

Python
03.10.2024
Reading time: 8 min

Similar

Python

How to Install pip on Windows

pip is a utility that turns Python package installation and management into a straightforward task. From Python beginners to coding wizards, having this utility on your Windows computer is a true game-changer. It effortlessly facilitates the setup of crucial frameworks and libraries for your development needs. Automating package management with pip frees up your time and reduces the complications linked to manual installations. Follow this guide to become proficient in configuring pip and overseeing your Python packages seamlessly. pip Setup Process for Windows Here are the guidelines to set up pip on a Windows machine. Step 1: Confirm Installation Verify Python is operational on your device before starting the pip setup. To carry out this operation, run command prompt and apply: python --version   If Python's not present on your system, download it from the official site. Step 2: Download get-pip.py Python's standard installation package automatically includes pip. However, in case of accidental removal, grab the get-pip.py script.  You have a couple of options: either visit the pip.py webpage, or use the curl command for a quick install: curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py Note: Installing Python again to get pip is also an option. However, it can sometimes lead to conflicts with other dependencies or settings. Your existing Python setup stays unchanged with this script. Step 3: Run get-pip.py Move to the script’s location through the command prompt and apply: python get-pip.py This will smoothly install pip on your device. Step 4: Confirm pip Installation Validate the installation by executing: pip --version Applying this command ensures pip is installed on the system. Step 5: Add pip to System PATH If the command doesn't execute properly, update your system PATH with these instructions to incorporate pip: Access Properties by right-clicking on My Computer or This PC from the drop-down menu. Opt for Advanced system settings. Select Environment Variables. Head over to System Variables, spot the Path variable, and choose Edit. Insert the Python Scripts directory into your system PATH, for example, C:\Python39\Scripts. Alternative Ways for pip Installation on Windows Let's discuss a few other ways to effortlessly get pip running on Windows. Via Built-in ensurepip Module From Python 3.4 onward, there's an awesome built-in module named ensurepip. With this tool, pip installation is simplified, eliminating the need for the get-pip.py script. Step 1: Run ensurepip Input the command below to set up pip: python -m ensurepip --default-pip Step 2: Verify pip Installation Check pip version through: pip --version Python Installer Approach for pip Installation Ensure the pip checkbox is marked during the Python setup. Here's how: Step 1: Download Installer Fire up your favorite browser, go to the official Python website, and acquire the most recent installation file. Step 2: Launch the Installer Launch the installer you've downloaded and remember to pick the Add Python to PATH option while setting up. Step 3: Install pip While progressing through the setup, don't forget to enable the Install pip option. Step 4: Validate pip is Installed When the setup wraps up, check pip installation via: pip --version Adjusting pip Version: Upgrade or Downgrade pip can be adjusted to suit your requirements by upgrading or downgrading. Here's how: Upgrading pip To give pip a fresh upgrade, execute: python -m pip install --upgrade pip Downgrading pip To roll back pip, apply: python -m pip install pip==<version> Enter the desired version number to install instead of <version> (e.g., 21.0). Resolving pip Installation Issues: Essential Commands Let's discover common pip installation issues and their fixes: Issue 1: "pip" is not recognized as an internal or external command Solution: This implies the pip path isn't set in your system PATH. Simply follow the instructions in "Step 5" to fix this. Issue 2: Permission Denied Solution: Elevate your command prompt privileges by right-clicking the Command Prompt icon and choosing Run as administrator. Afterward, rerun the commands. Issue 3: Missing Dependencies Solution: Sometimes, you'll run into trouble because of missing dependencies. To correct this, manually install the essential dependencies with pip. For example: pip install package_name Swap out package_name for the appropriate dependency. Utilizing Virtual Environments Employing virtual environments keeps dependencies distinct and avoids any conflicts. Here's how to utilize a virtual environment with pip: Creating a Virtual Environment python -m venv env_name Replace env_name with your desired environment name. Initiating Your Virtual Environment env_name\Scripts\activate Standard pip Commands To explore pip's usage, check these essential commands: Installing a Package pip install package_name Modify package_name to accurately reflect the package you're aiming to install. Uninstalling a Package pip uninstall package_name Showing Installed Packages pip list Showing Package Information pip show package_name Optimal Strategies for Package Management Employ virtual environments to handle dependencies efficiently in multiple projects. Regularly inspect and upgrade your packages to keep everything running smoothly. Prepare requirements files to ease the management of dependencies in your projects. Securing pip Installation Ensuring the protection of packages handled by pip is critical. Here are some tips to keep your environment secure: Maintain project isolation to avoid conflicts and secure installations. Check the trustworthiness and verification of package sources before installing. Always refer to official repositories and examine reviews if they are available. Consistently update pip and your packages to stay protected with the latest security patches and improvements. Periodically review your dependencies for known vulnerabilities. Tools such as pip-audit can assist in identifying and resolving security concerns. Adhere to secure coding standards and steer clear of deprecated or insecure packages. Integrating pip with IDEs pip can be effortlessly embedded into various Integrated Development Environments (IDEs), significantly boosting your development efficiency: VS Code: Utilize the built-in terminal for direct pip command and package management within the editor. PyCharm: Streamline package management by setting up pip configurations via the project interpreter. This simplifies the process of installing and managing packages customized to your project's specific needs. Jupyter Notebook: Employ magic commands in the notebook interface for direct package installation. This provides a smooth and integrated experience for managing dependencies while you work on your interactive notebooks.  Conclusion Windows offers several methods to set up pip, catering to different preferences and requirements. No matter if you select the .py script, use Python's built-in ensurepip module, or enable pip during the initial setup, these approaches will make sure pip is properly configured on your system. This all-in-one guide empowers you to handle and install Python packages with ease. Don't forget, keeping pip updated is essential for ensuring the security and efficiency of your Python setup. Routinely check for updates and keep pip upgraded. In addition, on our application platform you can find Python apps, such as Celery, Django, FastAPI and Flask.
15 January 2025 · 6 min to read
Python

How to Split a String Using the split() Method in Python

Working with strings is integral to many programming tasks, whether it involves processing user input, analyzing log files, or developing web applications. One of the fundamental tools that simplifies string manipulation in Python is the split() method. This method allows us to easily divide strings into parts based on specified criteria, making data processing and analysis more straightforward. In this article, we'll take a detailed look at the split() method, its syntax, and usage features. You'll learn how to use this method for solving everyday tasks and see how powerful it can be when applied correctly. Regardless of your programming experience level, you'll find practical tips and techniques to help you improve your string-handling skills in Python. What is the split() Method? The split() method is one of the core tools for working with strings in Python. It is designed to split a string into individual parts based on a specified delimiter, creating a list from these parts. This method is particularly useful for dividing text into words, extracting parameters from a string, or processing data separated by special characters, such as commas or tabs. The key idea behind the split() method is to transform a single string into a set of smaller, more manageable elements. This significantly simplifies data processing and allows programmers to perform analysis and transformation tasks more quickly and efficiently. Syntax of split() The split() method is part of Python's standard library and is applied directly to a string. Its basic syntax is as follows: str.split(sep=None, maxsplit=-1) Let’s break down the parameters of the split() method: sep (separator) This is an optional parameter that specifies the character or sequence of characters used as the delimiter for splitting the string. If sep is not provided or is set to None, the method defaults to splitting the string by whitespace (including spaces, tabs, and newline characters). If the string starts or ends with the delimiter, it is handled in a specific way. maxsplit This optional parameter defines the maximum number of splits to perform. By default, maxsplit is -1, which means there is no limit, and the string will be split completely. If maxsplit is set to a positive number, the method will split the string only the specified number of times, leaving the remaining part of the string as the last element in the resulting list. These parameters make it possible to customize split() to meet the specific requirements of your task. Let’s explore practical applications of split() with various examples to demonstrate its functionality and how it can be useful in daily data manipulation tasks. Examples of Using the split() Method To better understand how the split() method works, let's look at several practical examples that demonstrate its capabilities and applicability in various scenarios. Splitting a String by Spaces The most common use of the split() method is to break a string into words. By default, if no separator is specified, split() divides the string by whitespace characters. text = "Hello world from Python" words = text.split() print(words) Output: ['Hello', 'world', 'from', 'Python'] Splitting a String by a Specific Character If the data in the string is separated by another character, such as commas, you can specify that character as the sep argument. vegetable_list = "carrot,tomato,cucumber" vegetables = vegetable_list.split(',') print(vegetables) Output: ['carrot', 'tomato', 'cucumber'] Splitting a String a Specified Number of Times Sometimes, it’s necessary to limit the number of splits. The maxsplit parameter allows you to specify the maximum number of splits to be performed. text = "one#two#three#four" result = text.split('#', 2) print(result) Output: ['one', 'two', 'three#four'] In this example, the string was split into two parts, and the remaining portion after the second separator, 'three#four', was kept in the last list element. These examples demonstrate how flexible and useful the split() method can be in Python. Depending on your tasks, you can adapt its use to handle more complex string processing scenarios. Using the maxsplit Parameter The maxsplit parameter provides the ability to limit the number of splits a string will undergo. This can be useful when you only need a certain number of elements and do not require the entire string to be split. Let's take a closer look at how to use this parameter in practice. Limiting the Number of Splits Imagine you have a string containing a full file path, and you only need to extract the drive and the folder: path = "C:/Users/John/Documents/report.txt" parts = path.split('/', 2) print(parts) Output: ['C:', 'Users', 'John/Documents/report.txt'] Using maxsplit for Log File Processing Consider a string representing a log entry, where each part of the entry is separated by spaces. You are only interested in the first two fields—date and time. log_entry = "2024-10-23 11:15:32 User login successful" date_time = log_entry.split(' ', 2) print(date_time[:2]) Output: ['2024-10-23', '11:15:32'] In this case, we split the string twice and extract only the date and time, ignoring the rest of the entry. Application to CSV Data Sometimes, data may contain delimiter characters that you want to ignore after a certain point. csv_data = "Name,Email,Phone,Address" columns = csv_data.split(',', 2) print(columns) Output: ['Name', 'Email', 'Phone,Address'] Here, we limit the number of splits to keep the fields 'Phone' and 'Address' combined. The maxsplit parameter adds flexibility and control to the split() method, making it ideal for more complex data processing scenarios. Working with Delimiters Let’s examine how the split() method handles delimiters, including its default behavior and how to work with consecutive and multiple delimiters. Splitting by Default When no explicit delimiter is provided, the split() method splits the string by whitespace characters (spaces, tabs, and newlines). Additionally, consecutive spaces will be interpreted as a single delimiter, which is particularly useful when working with texts that may contain varying numbers of spaces between words. text = "Python is a versatile language" words = text.split() print(words) Output: ['Python', 'is', 'a', 'versatile', 'language'] Using a Single Delimiter Character If the string contains a specific delimiter, such as a comma or a colon, you can explicitly specify it as the sep argument. data = "red,green,blue,yellow" colors = data.split(',') print(colors) Output: ['red', 'green', 'blue', 'yellow'] In this case, the method splits the string wherever a comma is encountered. Working with Consecutive and Multiple Delimiters It’s important to note that when using a single delimiter character, split() does not treat consecutive delimiters as one. Each occurrence of the delimiter results in a new element in the resulting list, even if the element is empty. data = "one,,two,,,three" items = data.split(',') print(items) Output: ['one', '', 'two', '', '', 'three'] Splitting a String by Multiple Characters There are cases where you need to split a string using multiple delimiters or complex splitting rules. In such cases, it is recommended to use the re module and the re.split() function, which supports regular expressions. import re beverage_data = "coffee;tea juice|soda" beverages = re.split(r'[;|\s]', beverage_data) print(beverages) Output: ['coffee', 'tea', 'juice', 'soda'] In this example, a regular expression is used to split the string by several types of delimiters. Tips for Using the split() Method The split() method is a powerful and flexible tool for working with textual data in Python. To fully leverage its capabilities and avoid common pitfalls, here are some useful recommendations: Consider the Type of Delimiters When choosing a delimiter, make sure it matches the nature of the data. For instance, if the data contains multiple spaces, it might be more appropriate to use split() without explicitly specifying delimiters to avoid empty strings in the list. Use maxsplit for Optimization If you know that you only need a certain number of elements after splitting, use the maxsplit parameter to improve performance. This will also help avoid unexpected results when splitting long strings. Use Regular Expressions for Complex Cases The split() method with regular expressions enables solving more complex splitting tasks, such as when data contains multiple types of delimiters. Including the re library for this purpose significantly expands the method’s capabilities. Handle Empty Values When splitting a string with potentially missing values (e.g., when there are consecutive delimiters), make sure your code correctly handles empty strings or None. data = "value1,,value3" result = [item for item in data.split(',') if item] Validate Input Data Always consider potential errors, such as incorrect delimiters or unexpected data formats. Adding checks for values before calling split() can prevent many issues related to incorrect string splitting. Suitability for Use Remember that split() is unsuitable for processing more complex data structures, such as nested strings with quotes or data with escaped delimiters. In such cases, consider using specialized modules, such as csv for handling CSV formats. Following these tips, you can effectively use the split() method and solve textual data problems in Python. Understanding the subtleties of string splitting will help you avoid errors and make your code more reliable and understandable. Conclusion The split() method is an essential part of string handling in Python, providing developers with flexible and powerful tools for text splitting and data processing. In this article, we explored various aspects of using the split() method, including its syntax, working with parameters and delimiters, as well as practical examples and tips for its use. Check out our app platform to find Python applications, such as Celery, Django, FastAPI and Flask.
13 January 2025 · 8 min to read
Python

How to Convert a List to a Dictionary in Python

Python offers several fundamental data structures for storing data. Among the most popular are: List: Values with indices. Dictionary: Values with keys. Converting data from one type to another is essential to any dynamically typed programming language. Python, of course, is no exception. This guide will explain in detail what lists and dictionaries are and demonstrate various ways to convert one type to another. All examples in this article were executed using the Python interpreter version 3.10.12 on the Ubuntu 22.04 operating system, running on a Hostman cloud server. The list Type A list in Python is an ordered data structure of the "index-value" type. To create a list, use square brackets with values separated by commas: my_list = [False, True, 2, 'three', 4, 5] The list structure can be displayed in the console: print(my_list) The output will look like this: [False, True, 2, 'three', 4, 5] Accessing list values is done via indices: print(my_list[0]) # Output: False print(my_list[1]) # Output: True print(my_list[2]) # Output: 2 print(my_list[3]) # Output: three print(my_list[4]) # Output: 4 print(my_list[5]) # Output: 5 The dict Type A dictionary in Python is an unordered data structure of the "key-value" type. To create a dictionary, use curly braces with keys and values separated by colons and each pair separated by commas: my_dict = { 'James': '357 99 056 050', 'Natalie': '357 96 540 432', 'Kate': '357 96 830 726' } You can display the dictionary structure in the console as follows: print(my_dict) The output will look like this: {'James': '357 99 056 050', 'Natalie': '357 96 540 432', 'Kate': '357 96 830 726'} Accessing dictionary values is done via keys: print(my_dict['James']) # Output: 357 99 056 050 print(my_dict['Natalie']) # Output: 357 96 540 432 print(my_dict['Kate']) # Output: 357 96 830 726 Converting a List to a Dictionary You can convert a list to a dictionary in several ways: Use the dict.fromkeys() function, which creates a new dictionary with keys from the list. Use a dictionary comprehension with auxiliary functions and conditional operators. The latter option provides more flexibility for generating new dictionaries from existing lists. Creating Dictionary Keys from a List Using dict.fromkeys() The simplest way to create a dictionary from a list is to take the elements of a list instance and make them the keys of a dict instance. Optionally, you can add a default value for all keys in the new dictionary. This can be achieved using the standard dict.fromkeys() function. With this method, you can set a default value for all keys but not for individual keys. Here is an example of creating such a dictionary with keys from a list: objects = ['human', 'cat', 'alien', 'car'] # list of objects objects_states = dict.fromkeys(objects, 'angry') # create a dictionary with a default value for all keys objects_states_empty = dict.fromkeys(objects) # create a dictionary without specifying default values print(objects_states) # output the created dictionary with values print(objects_states_empty) # output the created dictionary without values Console output: {'human': 'angry', 'cat': 'angry', 'alien': 'angry', 'car': 'angry'} {'human': None, 'cat': None, 'alien': None, 'car': None} Creating a Dictionary from a List Using Dictionary Comprehension Another way to turn a list into dictionary keys is by using dictionary comprehension. This method is more flexible and allows for greater customization of the new dictionary. In its simplest form, the comprehension iterates over the list and copies all its elements as keys into a new dictionary, assigning them a specified default value. Here’s how to create a dictionary from a list using dictionary comprehension: objects = ['human', 'cat', 'alien', 'car'] objects_states = {obj: 'angry' for obj in objects} # dictionary comprehension with a string as the default value objects_states_empty = {obj: None for obj in objects} # dictionary comprehension with a default value of None print(objects_states) print(objects_states_empty) Console output: {'human': 'angry', 'cat': 'angry', 'alien': 'angry', 'car': 'angry'} {'human': None, 'cat': None, 'alien': None, 'car': None} In Python, the None object is a special value (null in most programming languages) that represents the absence of a value. The None object has a type of NoneType: print(type(None))  # Output: <class 'NoneType'> Creating a Dictionary from a List Using Dictionary Comprehension and the zip() Function A more advanced method is to use two lists to generate a dictionary: one for the keys and the other for their values. For this purpose, Python provides the zip() function, which allows iteration over multiple objects simultaneously. In simple loops, we can use this function like this: objects = ['human', 'cat', 'alien', 'car'] states = ['walking', 'purring', 'hiding', 'driving'] for obj, state in zip(objects, states): print(obj, state) The console output will be: human walking cat purring alien hiding car driving Thanks to this function, dictionary comprehension can simultaneously use elements from one list as keys and elements from another as values. In this case, the syntax for dictionary comprehension is not much different from a simple iteration: objects = ['human', 'cat', 'alien', 'car'] # list of future dictionary keys states = ['walking', 'purring', 'hiding', 'driving'] # list of future dictionary values objects_states = {obj: state for obj, state in zip(objects, states)} # dictionary comprehension iterating over both lists print(objects_states) Console output: {'human': 'walking', 'cat': 'purring', 'alien': 'hiding', 'car': 'driving'} A natural question arises: what happens if one of the lists is shorter than the other? objects = ['human', 'cat', 'alien', 'car'] states = ['walking', 'purring'] objects_states = {obj: state for obj, state in zip(objects, states)} print(objects_states) The output will be: {'human': 'walking', 'cat': 'purring'} Thus, iteration in the dictionary comprehension stops at the shortest list. The code above can be written in a very compact form using the dict() constructor: objects = ['human', 'cat', 'alien', 'car'] states = ['walking', 'purring', 'hiding', 'driving'] objects_states = dict(zip(objects, states)) # create a dictionary from two lists without a for loop print(objects_states) The console output will be the same as in the previous examples: {'human': 'walking', 'cat': 'purring', 'alien': 'hiding', 'car': 'driving'} Creating a Dictionary with zip() and Conditional Logic In real-world applications, logic is often more complex than the simple examples shown earlier. Sometimes, you need to convert lists into dictionaries while applying specific conditions. For instance, some elements might need modification before inclusion in the dictionary or might not be included at all. This can be achieved using conditions in dictionary comprehensions. For example, we can exclude specific elements from the resulting dictionary: objects = ['human', 'cat', 'alien', 'car'] states = ['walking', 'purring', 'hiding', 'driving'] objects_states = {obj: state for obj, state in zip(objects, states) if obj != 'alien'} # Protect Earth from unknown extraterrestrial influence print(objects_states) Console output: {'human': 'walking', 'cat': 'purring', 'car': 'driving'} We can refine the selection criteria further by introducing multiple conditions: objects = ['human', 'cat', 'alien', 'car'] states = ['walking', 'purring', 'hiding', 'driving'] objects_states = {obj: state for obj, state in zip(objects, states) if obj != 'alien' if obj != 'cat'} # Exclude the alien and the cat—who might be a disguised visitor from another galaxy print(objects_states) Console output: {'human': 'walking', 'car': 'driving'} When using multiple if statements in a dictionary comprehension, they behave as if connected by a logical and operator. You can make dictionary generation even more flexible by combining if and else operators: objects = ['human', 'cat', 'alien', 'car'] states = ['walking', 'purring', 'hiding', 'driving'] # In this example, all string elements in the first list are longer than those in the second list, except for 'cat' objects_states = { obj: ('[SUSPICIOUS]' if len(obj) < len(state) else 'calmly ' + state) for obj, state in zip(objects, states) } # Mark the suspicious 'cat' appropriately and slightly modify other values print(objects_states) Console output: {'human': 'calmly walking', 'cat': '[SUSPICIOUS]', 'alien': 'calmly hiding', 'car': 'calmly driving'} Creating a Complex Dictionary from a Single List In the earlier examples, we created dictionaries from two separate lists. But what if the keys and values needed for the new dictionary are contained within a single list? In such cases, the logic of the dictionary comprehension needs to be adjusted: objects_and_states = [ 'human', 'walking', 'cat', 'purring', 'alien', 'hiding', 'car', 'driving' ] # Keys and values are stored sequentially in one list objects_states = { objects_and_states[i]: objects_and_states[i + 1] for i in range(0, len(objects_and_states), 2) } # The `range` function specifies the start, end, and step for iteration: range(START, STOP, STEP) print(objects_states) Console output: {'human': 'walking', 'cat': 'purring', 'alien': 'hiding', 'car': 'driving'} Sometimes, a list might contain nested dictionaries as elements. The values of these nested dictionaries can also be used to create a new dictionary. Here’s how the logic changes in such cases: objects = [ {'name': 'human', 'state': 'walking', 'location': 'street'}, {'name': 'cat', 'state': 'purring', 'location': 'windowsill'}, {'name': 'alien', 'state': 'hiding', 'location': 'spaceship'}, {'name': 'car', 'state': 'driving', 'location': 'highway'} ] objects_states = { obj['name']: obj['state'] for obj in objects } # Extract 'name' as key and 'state' as value print(objects_states) Console output: {'human': 'walking', 'cat': 'purring', 'alien': 'hiding', 'car': 'driving'} This approach enables handling more complex data structures, such as lists of dictionaries, by targeting specific key-value pairs from each nested dictionary. Converting a Dictionary to a List Converting a dictionary into a list in Python is a straightforward task, often better described as extracting data. From a single dictionary, you can derive several types of lists: A list of keys A list of values A list of key-value pairs Here’s how it can be done: objects_states = { 'human': 'walking', 'cat': 'purring', 'alien': 'hiding', 'car': 'driving' } # Convert dictionary components to lists using the `list()` function objects_keys = list(objects_states.keys()) # List of keys objects_values = list(objects_states.values()) # List of values objects_items = list(objects_states.items()) # List of key-value pairs print(objects_keys) print(objects_values) print(objects_items) Console output: ['human', 'cat', 'alien', 'car'] ['walking', 'purring', 'hiding', 'driving'] [('human', 'walking'), ('cat', 'purring'), ('alien', 'hiding'), ('car', 'driving')] Conclusion Lists and dictionaries are fundamental data structures in Python, each offering distinct ways of storing and accessing data. Dictionaries are more informative than lists, storing data as key-value pairs, whereas lists store values that are accessed by index. Converting a dictionary into a list is straightforward, requiring no additional data since you’re simply extracting keys, values, or their pairs. Converting a list into a dictionary, on the other hand, requires additional data or rules to map the list elements to dictionary keys and values. There are a few methods to convert a List to Dictionary Tool Key Values Syntax dict.fromkeys() Common new_dict = dict.fromkeys(old_list) Dictionary Comprehension Common new_dict = {new_key: 'any value' for new_key in old_list} Dict Comp + zip() Unique new_dict = {new_key: old_val for new_key, old_val in zip(list1, list2)} Dict Comp + zip() + if Unique new_dict = {new_key: old_val for new_key, old_val in zip(list1, list2) if ...} Dict Comp + zip() + if-else Unique new_dict = {new_key: (... if ... else ...) for new_key, old_val in zip(list1, list2)} Complex lists may require more intricate dictionary comprehension syntax. Techniques shown in this guide, such as using zip() and range() for iterations, help handle such cases. Converting a dictionary to a list is also possible in several ways, but it is much simpler. Tool Extracts Syntax list.keys() Keys list(old_dict.keys()) list.values() Values list(old_dict.values()) list.items() Key-Value Pairs list(old_dict.items()) Python offers flexible and efficient ways to convert structured data types between lists and dictionaries, enabling powerful manipulation and access.
13 January 2025 · 11 min to read

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support