Handling Missing Data in Python DataFrames | Lecture 4

 

Handling Missing Data in Python DataFrames

Missing data is a common issue in data analysis, and Pandas provides several ways to handle it. Here's an overview with examples:


1. Identifying Missing Data

Missing data in a DataFrame is often represented as NaN (Not a Number). You can use these methods to detect them:

  • isnull(): Returns True for missing values.
  • notnull(): Returns False for missing values.

Example:

import pandas as pd

import numpy as np

 

# Creating a DataFrame with missing values

data = {

    "Name": ["Alice", "Bob", "Charlie", None],

    "Age": [25, np.nan, 30, 22],

    "Score": [85, 90, np.nan, 88]

}

df = pd.DataFrame(data)

 

# Detect missing values

print(df.isnull())

 

# Check for non-missing values

print(df.notnull())


2. Dropping Missing Data

  • dropna(): Removes rows or columns with missing values.

Example:

# Drop rows with any missing values

df_dropped_rows = df.dropna()

 

# Drop columns with any missing values

df_dropped_columns = df.dropna(axis=1)

 

print(df_dropped_rows)

print(df_dropped_columns)


3. Filling Missing Data

  • fillna(): Replaces missing values with a specified value or strategy.

Example:

# Fill missing values with a fixed value

df_filled_fixed = df.fillna(0)

 

# Fill missing values with the column mean

df["Age"] = df["Age"].fillna(df["Age"].mean())

print(df)


4. Interpolating Missing Data

  • interpolate(): Fills missing values using interpolation.

Example:

# Interpolate missing values

df_interpolated = df.interpolate()

print(df_interpolated)


5. Replacing Specific Values

  • replace(): Replaces specific values, including NaN.

Example:

# Replace NaN with a specific value

df_replaced = df.replace(np.nan, "Unknown")

print(df_replaced)


6. Checking for Missing Data in the Entire DataFrame

You can check if there are any missing values:

# Check if any values are missing

print(df.isnull().any().any())

 

# Count missing values per column

print(df.isnull().sum())


Summary Table of Methods

Method

Purpose

isnull()

Detect missing values

notnull()

Detect non-missing values

dropna()

Drop rows/columns with missing

fillna()

Fill missing values

interpolate()

Interpolate missing values

replace()

Replace specific values


These tools help clean and prepare data effectively for analysis. Let me know if you need help applying any of these!

 

Comments

Popular posts from this blog

NumPy: A Comprehensive Guide with Examples | Lecture 1

Ecommerce Purchases Data Analysis Exercises (Pandas Practice)