Skip to content

Latest commit

 

History

History
88 lines (66 loc) · 2.35 KB

File metadata and controls

88 lines (66 loc) · 2.35 KB
comments difficulty edit_url tags
true
Easy
Pandas

中文文档

Description

DataFrame customers
+-------------+--------+
| Column Name | Type   |
+-------------+--------+
| customer_id | int    |
| name        | object |
| email       | object |
+-------------+--------+

There are some duplicate rows in the DataFrame based on the email column.

Write a solution to remove these duplicate rows and keep only the first occurrence.

The result format is in the following example.

 

Example 1:
Input:
+-------------+---------+---------------------+
| customer_id | name    | email               |
+-------------+---------+---------------------+
| 1           | Ella    | [email protected]   |
| 2           | David   | [email protected] |
| 3           | Zachary | [email protected]   |
| 4           | Alice   | [email protected]    |
| 5           | Finn    | [email protected]    |
| 6           | Violet  | [email protected]   |
+-------------+---------+---------------------+
Output:  
+-------------+---------+---------------------+
| customer_id | name    | email               |
+-------------+---------+---------------------+
| 1           | Ella    | [email protected]   |
| 2           | David   | [email protected] |
| 3           | Zachary | [email protected]   |
| 4           | Alice   | [email protected]    |
| 6           | Violet  | [email protected]   |
+-------------+---------+---------------------+
Explanation:
Alic (customer_id = 4) and Finn (customer_id = 5) both use [email protected], so only the first occurrence of this email is retained.

Solutions

Solution 1

Python3

import pandas as pd


def dropDuplicateEmails(customers: pd.DataFrame) -> pd.DataFrame:
    return customers.drop_duplicates(subset=['email'])