How to remove duplicates from a list in Python

In Python, a list is a versatile collection of objects that allows for duplicates.  But sometimes it is necessary to make the list unique to streamline our data or perform certain operations. Here, we are going to study the multiple ways to remove duplicates from the list in Python. So, let’s get started!

Imagine reasons why we need to remove duplicates from the lists

We don’t want to waste resources to operate the same items. For example, we have 100 authors participating a project but some of the authors are working in the same organisation. We want to display the authors and their corresponding affiliations but we don’t want to show the same affiliation.

Some ways to remove duplicates

Note: The following ways are applicable for lists of any type of data such as primitive or class objects.

Using the traditional loop

# initializing list 
sam_list = [11, 13, 15, 16, 13, 15, 16, 11] 
print ("The list is: " + str(sam_list)) 

# remove duplicated from list 
result = [] 
for i in sam_list: 
    if i not in result: 
        result.append(i) 

# printing list after removal 
print ("The list after removing duplicates : " + str(result))

Running the above snippet, the output will look like

The list is: [11, 13, 15, 16, 13, 15, 16, 11]

The list after removing duplicates: [11, 13, 15, 16]

 

Using set()

In theory, an object declared from Set data type doesn’t accept identical elements added it. Using this principle, we can remove duplicates from a list quickly.

# initializing list 
sam_list = [11, 15, 13, 16, 13, 15, 16, 11] 
print ("The list is: " + str(sam_list)) 

# to remove duplicated from list 
sam_list = list(set(sam_list)) 

# printing list after removal 
# ordering distorted
print ("The list after removing duplicates: " + str(sam_list))

 

Using collections.OrderedDict.fromkeys()

This is the fastest method to solve the problem. We will first remove the duplicates and return a dictionary that has been converted to a list. In the below code when we use the fromkeys() method it will create keys of all the elements in the list. But keys in a dictionary cannot be duplicated, therefore, the fromkeys() method will remove duplicate values on its own.

# removing duplicated from list using collections.OrderedDict.fromkeys() 
from collections import OrderedDict 

# initializing list 
sam_list = [11, 15, 13, 16, 13, 15, 16, 11] 
print ("The list is: " + str(sam_list)) 

# to remove duplicated from list 
result = list(OrderedDict.fromkeys(sam_list)) 

# printing list after removal 
print ("The list after removing duplicates: " + str(result))

 

Using a list comprehension

List comprehension refers to using a for loop to create a list and then storing it under a variable name. The method is similar to the naive approach that we have discussed above but instead of using an external for loop, it creates a for loop inside the square braces of a list. This method is called list comprehension.

We use the for loop inside the list braces and add the if condition allowing us to filter out values that are duplicates.

# initialising list 
sam_list = [11, 13, 15, 16, 13, 15, 16, 11] 
print ("The list is: " + str(sam_list)) 

 
# to remove duplicated from list 
result = [] 
[result.append(x) for x in sam_list if x not in result] 

# printing list after removal 
print ("The list after removing duplicates: " + str(result))

 

Using list comprehension and enumerate()

List comprehensive when merged with enumerate function we can remove the duplicate from the python list. Basically in this method, the already occurred elements are skipped, and also the order is maintained. This is done by the enumerate function.

In the code below, the variable n keeps track of the index of the element being checked, and then it can be used to see if the element already exists in the list up to the index specified by n. If it does exist, we ignore it else we add it to a new list and this is done using list comprehensions too as we discussed above.

# initialise list 
sam_list = [11, 15, 13, 16, 13, 15, 16, 11] 
print ("The list is: " + str(sam_list)) 

# to remove duplicated from list 
result = [i for n, i in enumerate(sam_list) if i not in sam_list[:n]] 

# printing list after removal 
print ("The list after removing duplicates: " + str(result))

 

Using the pandas module

import pandas as pd
original_list = [1, 1, 2, 3, 4, 4]
new_list = pd.Series(original_list).drop_duplicates().tolist()
print("the original list is {} \nthe new list without duplicates is {}".format(original_list, new_list))

Running the tiny bit codes above, the output will look like

the original list is [1, 1, 2, 3, 4, 4]
the new list without duplicates is [1, 2, 3, 4]

Key Considerations

While the methods described above provide effective ways to remove duplicates from a list, there are a few additional considerations to keep in mind:

  • Data Type Compatibility: Some methods may have limitations when working with specific data types. Ensure that the chosen method is compatible with the elements in your list. 
  • Order Preservation: If maintaining the original order of elements is crucial, consider using methods such as list comprehension with enumeration or the collections module’s OrderedDict class. 
  • Performance: For large lists or time-sensitive operations, it’s important to choose a method that offers optimal performance. Consider conducting performance tests to determine the most efficient approach.

Voilà! Hope this helps your life much easier!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.