Handling Missing Data in Python DataFrames | Lecture 4
Handling Missing Data in Python DataFrames
Missing
data is a common issue in data analysis, and Pandas provides several ways to
handle it. Here's an overview with examples:
1. Identifying Missing Data
Missing
data in a DataFrame is often represented as NaN (Not a
Number). You can use these methods to detect them:
- isnull(): Returns True for missing values.
- notnull(): Returns False for missing values.
Example:
import pandas as pd
import numpy as np
# Creating a DataFrame with missing values
data = {
"Name": ["Alice", "Bob",
"Charlie", None],
"Age": [25, np.nan, 30, 22],
"Score": [85, 90, np.nan, 88]
}
df = pd.DataFrame(data)
# Detect missing values
print(df.isnull())
# Check for non-missing values
print(df.notnull())
2. Dropping Missing Data
- dropna(): Removes rows or columns
with missing values.
Example:
# Drop rows with any missing values
df_dropped_rows = df.dropna()
# Drop columns with any missing values
df_dropped_columns = df.dropna(axis=1)
print(df_dropped_rows)
print(df_dropped_columns)
3. Filling Missing Data
- fillna(): Replaces missing values
with a specified value or strategy.
Example:
# Fill missing values with a fixed value
df_filled_fixed = df.fillna(0)
# Fill missing values with the column mean
df["Age"] =
df["Age"].fillna(df["Age"].mean())
print(df)
4. Interpolating Missing Data
- interpolate(): Fills missing values using
interpolation.
Example:
# Interpolate missing values
df_interpolated = df.interpolate()
print(df_interpolated)
5. Replacing Specific Values
- replace(): Replaces specific values,
including NaN.
Example:
# Replace NaN with a specific value
df_replaced = df.replace(np.nan,
"Unknown")
print(df_replaced)
6. Checking for Missing Data in the Entire
DataFrame
You can
check if there are any missing values:
# Check if any values are missing
print(df.isnull().any().any())
# Count missing values per column
print(df.isnull().sum())
Summary Table of Methods
|
Method |
Purpose |
|
isnull() |
Detect
missing values |
|
notnull() |
Detect
non-missing values |
|
dropna() |
Drop
rows/columns with missing |
|
fillna() |
Fill
missing values |
|
interpolate() |
Interpolate
missing values |
|
replace() |
Replace
specific values |
These
tools help clean and prepare data effectively for analysis. Let me know if you
need help applying any of these!
Comments
Post a Comment