+4 votes
in Programming Languages by (77.0k points)

I used the pandas concat() function to concatenate two dataframes. Since the dataframes had a common column, it created the new dataframe with duplicate columns. How can I remove the duplicate column from the resulting dataframe?

1 Answer

+2 votes
by (354k points)
selected by
 
Best answer

There could be several ways to delete a duplicate column from a dataframe. One of the simplest ways is to find the duplicate column using the duplicated() function and then remove it.

Here is an example of that.

import pandas as pd
df1 = pd.DataFrame({"name": ['AA', 'BB', 'CC', 'DD', 'EE', 'HH', 'II'], "age": [34, 12, 56, 43, 23, 41, 52]})
df2 = pd.DataFrame({"name": ['AA', 'BB', 'CC', 'DD', 'EE', 'FF', 'GG'], "income": [3434, 1122, 2156, 4334, 54523, 4321, 6541]})
df = pd.concat([df1, df2], axis=1)
df = df.loc[:, ~df.columns.duplicated()]
print(df)

The above code will print the following output. Although df1 and df2 have the column 'name', the final dataframe does not have the duplicate column 'name'.

 name  age  income
0   AA   34    3434
1   BB   12    1122
2   CC   56    2156
3   DD   43    4334
4   EE   23   54523
5   HH   41    4321
6   II   52    6541


...