Sign In
Sign In

The filter() Function in Python

The filter() Function in Python
Hostman Team
Technical writer
Python
21.10.2024
Reading time: 6 min

Python has consistently ranked among the most popular programming languages. According to the TIOBE Software Index, Python was the most popular language in 2021. In this article, we’ll explore the usage of the filter() function in Python, which is one of the most important functions in the language.

What is filter()?

filter() is a built-in Python function, meaning it does not require importing additional libraries.

Syntax

The function takes two arguments: a function and an iterable object.

filter(function, iterable)
  • function is a function with a single argument. This function is used to filter values.

  • iterable is an object that can be iterated over, such as a list, tuple, dictionary, etc. It can also include generator or iterator objects. The filter() function accepts only one iterable object.

The filter() function breaks down the provided iterable object into elements and passes each one to the given function, which returns a value (True, False, or something else like a number or string). The filter() function evaluates the returned value, and if it is "truthy" (not necessarily equal to True, but considered true), it adds the element to the iterator. If the value is not truthy, the element is excluded. The result is an iterator containing only the elements that returned True during filtering.

To get elements evaluated as False, use the itertools.filterfalse() function.

The filter() function is more efficient in terms of execution time than a for loop, which can also be used for filtering. Another advantage is that filter() returns an iterator, which is a more memory-efficient way of handling data. This was introduced for filter() in Python 3. In Python 2, the filter() function returns a list.

Now that we’ve covered the basics, let's look at how filter() works through various examples.

Using filter() With a Custom Function

One of the simplest examples is filtering even numbers.

numbers = [1, 2, 3, 4, 5, 7, 10, 11]
def filter_num(num):
    if (num % 2) != 0:
        return True
    else:
        return False

out_filter = filter(filter_num, numbers)
print("Filtered list: ", list(out_filter))

In this case, we pass a custom function (filter_num) and a list of numbers (numbers) to filter(). The result will be:

Filtered list: [1, 3, 5, 7, 11]

Our custom function checks if each number is odd. If there is a non-zero remainder when dividing by 2, the function returns True, meaning the element is added to the resulting iterator. Since filter() returns an object of type <class 'filter'>, we need to convert the output to a list to see the result. This example can also be implemented using a lambda function:

filter(lambda n: n % 2 != 0, numbers)

Finding the Intersection of Two Arrays

Input data:

arr1 = ['1', '2', '3', '4', 5, 6, 7]
arr2 = [1, '2', 3, '4', '5', '6', 7]

We write a function to find the intersection:

def intersection(arr1, arr2):
   out = list(filter(lambda it: it in arr1, arr2))
   return out

The function takes two arrays as input and checks them. Using a lambda function, it identifies the common elements.

Calling the function and displaying the result:

out = intersection(arr1, arr2)
print("Filtered list:", out)

The result:

Filtered list: ['2', '4', 7]

Using filter() With a Lambda Function

The Python filter() function can also accept lambda functions. For example, let’s create a palindrome detector:

word = ["cat", "rewire", "level", "book", "stats", "list"]
palindromes = list(filter(lambda word: word == word[::-1], word))
print("Palindromes: ", list(palindromes))

The result:

Palindromes: ['level', 'stats']

The lambda function checks if a word is the same when written in reverse. If it is, the function returns True.

Using filter() to Filter Outliers in a Dataset

We import a library for statistical computations and set up a normally distributed sample with a few outliers:

import statistics as st
sample = [10, 8, 10, 8, 2, 7, 9, 3, 34, 9, 5, 9, 25]

We calculate the mean:

mean = st.mean(sample)

Mean: 10.69

In normally distributed samples, outliers are often defined as values that deviate from the mean by more than two standard deviations.

stdev = st.stdev(sample)
low = mean - 2*stdev
high = mean + 2*stdev

Next, we calculate the standard deviation and the upper and lower bounds, then filter the sample:

clean_sample = list(filter(lambda x: low <= x <= high, sample))

Result:

[10, 8, 10, 8, 2, 7, 9, 3, 9, 5, 9, 25]

Clearly, the value 34 was an outlier. Now, the new mean is 8.75.

If we perform another iteration of this method, the value 25 will also be filtered out, leaving us with:

Sample without outliers: [10, 8, 10, 8, 2, 7, 9, 3, 9, 5, 9]

The new mean is 7.273, which is significantly different from the original.

Working With None

To understand how filter() handles None, let’s look at the following example:

list_ = [0, 1, 'Hello', '', None, [], [1,2,3], 0.1, 0.0, False]
print(list(filter(None, list_)))

If None is passed as the function in filter(), it filters out all logically False elements (i.e., elements that are false by themselves). In this case, the result will be:

[1, 'Hello', [1, 2, 3], 0.1]

Here, elements like 0, [], None, '', False are filtered out because they have a logical value of False.

Using filter() With a List of Dictionaries

The function can also work with more complex data structures. For example, if we have a list of dictionaries and want to iterate through each element in the list, including key-value pairs in those dictionaries.

Let’s take a list of books in a bookstore:

books = [
    {"Title": "Angels and Demons", "Author": "Dan Brown", "Price": 9},
    {"Title": "Harry Potter and the Philosopher's Stone", "Author": "J.K. Rowling", "Price": 7},
    {"Title": "Anna Karenina", "Author": "Leo Tolstoy", "Price": 5},
    {"Title": "Dead Souls", "Author": "Nikolai Gogol", "Price": 4}
]

We will filter books by price. We’ll write a function that retrieves all books costing more than 5:

def cost(book):
   return book["Price"] > 5

Here, the function simply checks each book’s price and returns True if it meets the condition. To display the book titles, we iterate through the filtered object:

filtered_object = filter(cost, books)
for row in filtered_object:
   print(dict(row)["Title"])

Result:

Angels and Demons
Harry Potter and the Philosopher's Stone

Filtering NaN Values

Suppose we have the following sample:

sample = [10.1, 8.3, 10.4, 8.8, float("nan"), 7.2, float("nan")]

If we try to compute something like the mean or standard deviation on this sample, we will get nan (not a number). NaN values can appear for various reasons, so one option is to remove them from the data.

We use the isnan() function from the math module, which checks if a value is NaN:

import math
import statistics as st
sample = [10.1, 8.3, 10.4, 8.8, float("nan"), 7.2, float("nan")]
def searcnan(x):
    return not math.isnan(x)

Now, when we call:

st.mean(filter(searcnan, sample))

We get a result of 8.96.

Alternatively, we can simplify this by using the filterfalse() function, which retains elements where the condition is False:

from itertools import filterfalse
st.mean(filterfalse(math.isnan, sample))

The result is the same: 8.96.

Conclusion

As we’ve seen, Python’s filter() function can be used in various ways. We covered some of the main applications, but as you continue to work creatively, you’ll likely discover many other ways to use this powerful function.

Python
21.10.2024
Reading time: 6 min

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start
Email us