DEV Community

Codes With Pankaj
Codes With Pankaj

Posted on

A Comprehensive Guide to Data Manipulation with Pandas: Selecting, Filtering, Sorting, and Modifying Data

Introduction:
Pandas, the popular Python library for data analysis, provides powerful tools for manipulating and analyzing tabular data. In this comprehensive guide, we'll explore key Pandas operations that are essential for any data analyst or data scientist. We'll delve into selecting rows and columns, filtering data, sorting data, and adding/deleting columns using Pandas.

1. Selecting Rows and Columns

A. Selecting Columns

Pandas allows you to select specific columns from your dataset. Here's how to do it:

import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 22],
        'City': ['New York', 'San Francisco', 'Los Angeles']}
df = pd.DataFrame(data)

# Select a single column
name_column = df['Name']

# Select multiple columns
name_age = df[['Name', 'Age']]
Enter fullscreen mode Exit fullscreen mode

B. Selecting Rows

You can also select rows based on various criteria:

# Select rows based on a condition
young_people = df[df['Age'] < 30]

# Select rows by index
row = df.loc[1]  # Selects the second row

# Select rows by position
row = df.iloc[0]  # Selects the first row
Enter fullscreen mode Exit fullscreen mode

2. Filtering Data

Filtering data is crucial for working with large datasets. You can filter data based on conditions:

# Filter data based on a condition
new_yorkers = df[df['City'] == 'New York']

# Combine multiple conditions
young_new_yorkers = df[(df['Age'] < 30) & (df['City'] == 'New York')]
Enter fullscreen mode Exit fullscreen mode

3. Sorting Data

Sorting data is vital for gaining insights from your dataset:

# Sort by a single column
sorted_by_age = df.sort_values(by='Age')

# Sort by multiple columns
sorted_by_city_and_age = df.sort_values(by=['City', 'Age'])
Enter fullscreen mode Exit fullscreen mode

4. Adding and Deleting Columns

You can easily add and delete columns in your Pandas DataFrame:

A. Adding Columns

To add a new column:

# Add a new column 'Gender'
df['Gender'] = ['Female', 'Male', 'Male']

# Use existing columns to compute a new one
df['Birth_Year'] = 2023 - df['Age']
Enter fullscreen mode Exit fullscreen mode

B. Deleting Columns

To delete a column:

# Delete the 'City' column
df = df.drop(columns=['City'])

# Alternative method to delete a column
del df['Age']
Enter fullscreen mode Exit fullscreen mode

view more --> https://codeswithpankaj.medium.com/python-pandas-dataframe-6b7eb73a9393

Conclusion:
Pandas is a versatile library that enables you to select, filter, sort, and modify data with ease. Whether you're analyzing financial data, working with sensor readings, or processing survey responses, these Pandas operations will prove invaluable. By mastering these techniques, you can efficiently manage and extract meaningful insights from your datasets, making Pandas an essential tool for any data professional.

This guide is just the tip of the iceberg when it comes to Pandas capabilities. Explore the official Pandas documentation for more advanced operations and functions to take your data manipulation skills to the next level.

Top comments (0)