Groupby in Pandas | Lecture 5

December 10, 2024

Groupby in Pandas

The groupby operation in Pandas is a powerful tool for analyzing and summarizing data. It allows you to split your data into groups based on a column or index, apply a function to each group, and combine the results.

1. Basic Syntax

grouped = df.groupby("column_name")

groupby() groups the data based on the values in the specified column(s).
It returns a DataFrameGroupBy object, which you can aggregate, transform, or iterate over.

2. Common Operations with groupby

a) Aggregation

Aggregation functions like sum(), mean(), count(), etc., can be applied to grouped data.

Example:

import pandas as pd

# Creating a sample DataFrame

data = {

"Department": ["HR", "IT", "HR", "IT", "Sales", "HR", "Sales"],

"Employee": ["Alice", "Bob", "Charlie", "David", "Eva", "Frank", "Grace"],

"Salary": [50000, 60000, 45000, 70000, 40000, 52000, 39000]

}

df = pd.DataFrame(data)

# Group by Department and calculate the total Salary

grouped = df.groupby("Department")["Salary"].sum()

print(grouped)

Output:

Department

HR 147000

IT 130000

Sales 79000

Name: Salary, dtype: int64

b) Multiple Aggregations

Use agg() to apply multiple functions at once.

Example:

# Apply multiple aggregations

grouped = df.groupby("Department")["Salary"].agg(["mean", "sum", "max"])

print(grouped)

Output:

mean sum max

Department

HR 49000.000 147000 52000

IT 65000.000 130000 70000

Sales 39500.000 79000 40000

c) Filtering Groups

Use filter() to keep groups based on a condition.

Example:

# Keep groups with total salary > 100000

filtered = df.groupby("Department").filter(lambda x: x["Salary"].sum() > 100000)

print(filtered)

d) Transforming Groups

Use transform() to apply a function to each group and return a Series with the same index as the original DataFrame.

Example:

# Calculate the percentage of each salary within its department

df["Salary_Percent"] = df.groupby("Department")["Salary"].transform(lambda x: x / x.sum())

print(df)

e) Iterating Over Groups

You can iterate through each group using a for loop.

Example:

# Iterate through groups

for dept, group in df.groupby("Department"):

print(f"Department: {dept}")

print(group)

3. Grouping by Multiple Columns

You can group by more than one column.

Example:

# Group by multiple columns

grouped = df.groupby(["Department", "Employee"])["Salary"].sum()

print(grouped)

4. Grouping with Custom Functions

You can group using a custom function applied to the index or column values.

Example:

# Group by first letter of Employee names

grouped = df.groupby(lambda x: df.loc[x, "Employee"][0])["Salary"].sum()

print(grouped)

5. Resetting Index After Grouping

Use reset_index() to convert the result back to a DataFrame.

Example:

# Convert grouped data back to DataFrame

grouped = df.groupby("Department")["Salary"].sum().reset_index()

print(grouped)

Summary Table of Common Functions

Function	Purpose
groupby()	Group data based on a column or index
agg()	Apply multiple aggregation functions
filter()	Filter groups based on a condition
transform()	Transform data while retaining original structure
reset_index()	Reset the index of grouped data

groupby is a flexible tool for aggregating, analyzing, and transforming data. Let me know if you'd like to dive deeper into any of these examples!

Search This Blog

data analysis training in jaipur

Groupby in Pandas | Lecture 5

Comments

Post a Comment

Popular posts from this blog

NumPy: A Comprehensive Guide with Examples | Lecture 1

Ecommerce Purchases Data Analysis Exercises (Pandas Practice)

Handling Missing Data in Python DataFrames | Lecture 4