Posts

Showing posts from December, 2024

Groupby in Pandas | Lecture 5

  Groupby in Pandas The groupby operation in Pandas is a powerful tool for analyzing and summarizing data. It allows you to split your data into groups based on a column or index, apply a function to each group, and combine the results. 1. Basic Syntax grouped = df.groupby("column_name") groupby() groups the data based on the values in the specified column(s). It returns a DataFrameGroupBy object, which you can aggregate, transform, or iterate over. 2. Common Operations with groupby a) Aggregation Aggregation functions like sum() , mean() , count() , etc., can be applied to grouped data. Example: import pandas as pd   # Creating a sample DataFrame data = {     "Department": ["HR", "IT", "HR", "IT", "Sales", "HR", "Sales"],     "Employee": ["Alice", "Bob", "Charlie", "David", "Eva",...

Handling Missing Data in Python DataFrames | Lecture 4

  Handling Missing Data in Python DataFrames Missing data is a common issue in data analysis, and Pandas provides several ways to handle it. Here's an overview with examples: 1. Identifying Missing Data Missing data in a DataFrame is often represented as NaN (Not a Number). You can use these methods to detect them: isnull() : Returns True for missing values. notnull() : Returns False for missing values. Example: import pandas as pd import numpy as np   # Creating a DataFrame with missing values data = {     "Name": ["Alice", "Bob", "Charlie", None],     "Age": [25, np.nan, 30, 22],     "Score": [85, 90, np.nan, 88] } df = pd.DataFrame(data)   # Detect missing values print(df.isnull())   # Check for non-missing values print(df.notnull()) 2. Dropping Missing Data dropna() : Removes rows or columns with missing values. Example: # Drop rows with any ...

DataFrames in Python with examples | Lecture 3

  Here’s a detailed guide to DataFrames in Python with examples: What is a DataFrame? A DataFrame is a two-dimensional, labeled data structure provided by the pandas library. It can be thought of as a table, similar to a spreadsheet, SQL table, or a dictionary of Series objects. A DataFrame is highly flexible and can handle data in various formats. Key Features Labeled axes : Rows and columns have labels (index and column names). Heterogeneous data : Can contain different types of data (integers, floats, strings, etc.). Size mutable : Rows and columns can be added or deleted. How to Create a DataFrame? 1. From a Dictionary import pandas as pd   data = {     'Name': ['Alice', 'Bob', 'Charlie'],     'Age': [25, 30, 35],     'City': ['New York', 'Los Angeles', 'Chicago'] } df = pd.DataFrame(data) print(df) Output...