Groupby in Pandas | Lecture 5

 

Groupby in Pandas

The groupby operation in Pandas is a powerful tool for analyzing and summarizing data. It allows you to split your data into groups based on a column or index, apply a function to each group, and combine the results.


1. Basic Syntax

grouped = df.groupby("column_name")

  • groupby() groups the data based on the values in the specified column(s).
  • It returns a DataFrameGroupBy object, which you can aggregate, transform, or iterate over.

2. Common Operations with groupby

a) Aggregation

Aggregation functions like sum(), mean(), count(), etc., can be applied to grouped data.

Example:

import pandas as pd

 

# Creating a sample DataFrame

data = {

    "Department": ["HR", "IT", "HR", "IT", "Sales", "HR", "Sales"],

    "Employee": ["Alice", "Bob", "Charlie", "David", "Eva", "Frank", "Grace"],

    "Salary": [50000, 60000, 45000, 70000, 40000, 52000, 39000]

}

df = pd.DataFrame(data)

 

# Group by Department and calculate the total Salary

grouped = df.groupby("Department")["Salary"].sum()

print(grouped)

Output:

Department

HR       147000

IT       130000

Sales     79000

Name: Salary, dtype: int64


b) Multiple Aggregations

Use agg() to apply multiple functions at once.

Example:

# Apply multiple aggregations

grouped = df.groupby("Department")["Salary"].agg(["mean", "sum", "max"])

print(grouped)

Output:

                mean     sum    max

Department                         

HR        49000.000  147000  52000

IT        65000.000  130000  70000

Sales     39500.000   79000  40000


c) Filtering Groups

Use filter() to keep groups based on a condition.

Example:

# Keep groups with total salary > 100000

filtered = df.groupby("Department").filter(lambda x: x["Salary"].sum() > 100000)

print(filtered)


d) Transforming Groups

Use transform() to apply a function to each group and return a Series with the same index as the original DataFrame.

Example:

# Calculate the percentage of each salary within its department

df["Salary_Percent"] = df.groupby("Department")["Salary"].transform(lambda x: x / x.sum())

print(df)


e) Iterating Over Groups

You can iterate through each group using a for loop.

Example:

# Iterate through groups

for dept, group in df.groupby("Department"):

    print(f"Department: {dept}")

    print(group)


3. Grouping by Multiple Columns

You can group by more than one column.

Example:

# Group by multiple columns

grouped = df.groupby(["Department", "Employee"])["Salary"].sum()

print(grouped)


4. Grouping with Custom Functions

You can group using a custom function applied to the index or column values.

Example:

# Group by first letter of Employee names

grouped = df.groupby(lambda x: df.loc[x, "Employee"][0])["Salary"].sum()

print(grouped)


5. Resetting Index After Grouping

Use reset_index() to convert the result back to a DataFrame.

Example:

# Convert grouped data back to DataFrame

grouped = df.groupby("Department")["Salary"].sum().reset_index()

print(grouped)


Summary Table of Common Functions

Function

Purpose

groupby()

Group data based on a column or index

agg()

Apply multiple aggregation functions

filter()

Filter groups based on a condition

transform()

Transform data while retaining original structure

reset_index()

Reset the index of grouped data


groupby is a flexible tool for aggregating, analyzing, and transforming data. Let me know if you'd like to dive deeper into any of these examples!

 

Comments

Popular posts from this blog

NumPy: A Comprehensive Guide with Examples | Lecture 1

Ecommerce Purchases Data Analysis Exercises (Pandas Practice)

Handling Missing Data in Python DataFrames | Lecture 4