DataFrames in Python with examples

DataFrames in Python with examples | Lecture 3

December 10, 2024

Here’s a detailed guide to DataFrames in Python with examples:

What is a DataFrame?

A DataFrame is a two-dimensional, labeled data structure provided by the pandas library.
It can be thought of as a table, similar to a spreadsheet, SQL table, or a dictionary of Series objects.
A DataFrame is highly flexible and can handle data in various formats.

Key Features

Labeled axes: Rows and columns have labels (index and column names).
Heterogeneous data: Can contain different types of data (integers, floats, strings, etc.).
Size mutable: Rows and columns can be added or deleted.

How to Create a DataFrame?

1. From a Dictionary

import pandas as pd

data = {

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35],

'City': ['New York', 'Los Angeles', 'Chicago']

}

df = pd.DataFrame(data)

print(df)

Output:

Name Age City

0 Alice 25 New York

1 Bob 30 Los Angeles

2 Charlie 35 Chicago

2. From a List of Lists

data = [

['Alice', 25, 'New York'],

['Bob', 30, 'Los Angeles'],

['Charlie', 35, 'Chicago']

]

df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])

print(df)

3. From a Dictionary of Series

data = {

'Name': pd.Series(['Alice', 'Bob', 'Charlie']),

'Age': pd.Series([25, 30, 35]),

'City': pd.Series(['New York', 'Los Angeles', 'Chicago'])

}

df = pd.DataFrame(data)

print(df)

4. From a CSV File

df = pd.read_csv('data.csv') # Replace 'data.csv' with your file path

print(df)

5. From a NumPy Array

import numpy as np

data = np.array([

['Alice', 25, 'New York'],

['Bob', 30, 'Los Angeles'],

['Charlie', 35, 'Chicago']

])

df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])

print(df)

Accessing Data in a DataFrame

1. Access Columns

print(df['Name']) # Access 'Name' column

2. Access Rows by Index

print(df.loc[0]) # Access row with index 0

print(df.iloc[1]) # Access row by position (1st row)

3. Access Specific Elements

print(df.loc[0, 'Name']) # Access specific element by label

print(df.iloc[0, 0]) # Access specific element by position

4. Slicing

print(df[0:2]) # Access first two rows

print(df[['Name', 'City']]) # Access multiple columns

Operations on DataFrames

1. Basic Statistics

print(df.describe()) # Summary statistics for numerical columns

2. Sorting

df_sorted = df.sort_values(by='Age', ascending=False)

print(df_sorted)

3. Filtering

df_filtered = df[df['Age'] > 30]

print(df_filtered)

4. Adding New Columns

df['Salary'] = [50000, 60000, 70000] # Add a new column

print(df)

5. Deleting Columns

df = df.drop('City', axis=1) # Drop the 'City' column

print(df)

6. Renaming Columns

df = df.rename(columns={'Age': 'Years'})

print(df)

7. Group By

data = {

'Name': ['Alice', 'Bob', 'Alice', 'Bob'],

'Subject': ['Math', 'Math', 'Science', 'Science'],

'Score': [90, 85, 95, 80]

}

df = pd.DataFrame(data)

grouped = df.groupby('Name').mean()

print(grouped)

Handling Missing Data

1. Check for Missing Data

print(df.isnull()) # Check for null values

print(df.isnull().sum()) # Count null values per column

2. Fill Missing Data

df['Age'] = df['Age'].fillna(df['Age'].mean()) # Fill missing values with mean

3. Drop Rows/Columns with Missing Data

df = df.dropna() # Drop rows with missing data

Merging, Joining, and Concatenating

1. Merging

df1 = pd.DataFrame({'ID': [1, 2], 'Name': ['Alice', 'Bob']})

df2 = pd.DataFrame({'ID': [1, 2], 'Age': [25, 30]})

merged = pd.merge(df1, df2, on='ID')

print(merged)

2. Concatenating

df1 = pd.DataFrame({'Name': ['Alice', 'Bob']})

df2 = pd.DataFrame({'Name': ['Charlie', 'David']})

concatenated = pd.concat([df1, df2])

print(concatenated)

Key Methods

Method	Description	Example
head(n)	First n rows	df.head(3)
tail(n)	Last n rows	df.tail(3)
info()	Information about DataFrame	df.info()
shape	Shape of DataFrame (rows, columns)	df.shape
columns	List of column names	df.columns
value_counts()	Count of unique values	df['Column'].value_counts()
apply()	Apply a function	df['Column'].apply(lambda x: x**2)
pivot_table()	Create pivot table	df.pivot_table(index='Column1')

Practical Example

Employee Salary Analysis

data = {

'Name': ['Alice', 'Bob', 'Charlie', 'David'],

'Department': ['HR', 'IT', 'IT', 'HR'],

'Salary': [50000, 60000, 70000, 55000]

}

df = pd.DataFrame(data)

# Average salary by department

avg_salary = df.groupby('Department')['Salary'].mean()

print("Average Salary by Department:\n", avg_salary)

# Employees earning above 55,000

high_earners = df[df['Salary'] > 55000]

print("\nHigh Earners:\n", high_earners)

Output:

Average Salary by Department:

Department

HR 52500.0

IT 65000.0

Name: Salary, dtype: float64

High Earners:

Name Department Salary

1 Bob IT 60000

2 Charlie IT 70000

This provides a comprehensive guide to working with DataFrames in Python. Let me know if you’d like to explore more use cases!

Search This Blog

data analysis training in jaipur

DataFrames in Python with examples | Lecture 3

Comments

Post a Comment

Popular posts from this blog

NumPy: A Comprehensive Guide with Examples | Lecture 1

Ecommerce Purchases Data Analysis Exercises (Pandas Practice)

Handling Missing Data in Python DataFrames | Lecture 4