Data Anaythics using Python

Data Manipulation with Pandas

605

Pandas is a popular Python library for data manipulation and analysis. It provides powerful data structures for working with structured data, including Series (one-dimensional labeled arrays) and DataFrame (two-dimensional labeled data tables). In this post, we will cover the basics of data manipulation with Pandas, including reading data into a DataFrame, selecting and filtering data, sorting data, and aggregating data.

Reading Data into a DataFrame

The first step in data manipulation with Pandas is to read data into a DataFrame. Pandas provides several functions for reading data from various sources, including CSV files, Excel files, SQL databases, and more. Here’s an example of how to read data from a CSV file into a DataFrame:

kotlin

import pandas as pd

data = pd.read_csv('data.csv')

In this example, we have imported Pandas and used the read_csv function to read data from a CSV file named “data.csv” into a DataFrame named “data”. We can then use this DataFrame to manipulate and analyze the data.

Selecting and Filtering Data

Once we have data in a DataFrame, we can select and filter the data based on various criteria. Pandas provides several functions for selecting and filtering data, including loc, iloc, and boolean indexing.

The loc function allows us to select data based on row labels and column names. Here’s an example:

sql

# Select a single row

row = data.loc[0]
# Select a single column

column = data['column_name']

# Select multiple rows and columns subset = data.loc[[0, 1, 2], ['column1', 'column2']]

In this example, we have used the loc function to select a single row, a single column, and a subset of rows and columns from the DataFrame.

The iloc function allows us to select data based on row and column indices. Here’s an example:

makefile

# Select a single row

row = data.iloc[0]
# Select a single column

column = data.iloc[:, 0]

# Select multiple rows and columns subset = data.iloc[[0, 1, 2], [0, 1]]

In this example, we have used the iloc function to select a single row, a single column, and a subset of rows and columns from the DataFrame.

Boolean indexing allows us to filter data based on a condition. Here’s an example:

kotlin

# Filter rows where column1 > 0

filtered_data = data[data['column1'] > 0]

In this example, we have used boolean indexing to filter the rows in the DataFrame where the value of “column1” is greater than 0.

Sorting Data

Pandas also provides functions for sorting data in a DataFrame. We can sort data by one or more columns, in ascending or descending order. Here’s an example:

kotlin

# Sort data by a single column

sorted_data = data.sort_values('column1')
# Sort data by multiple columns

sorted_data = data.sort_values(['column1', 'column2'])

# Sort data in descending order sorted_data = data.sort_values('column1', ascending=False)

In this example, we have used the sort_values function to sort the data in the DataFrame by one or more columns, in ascending or descending order.

Aggregating Data

Finally, Pandas provides functions for aggregating data in a DataFrame, such as calculating the mean, median, sum, count, and more. Here’s an example:

bash

# Calculate the mean of a column

mean = data['column1'].mean()

# Calculate thesum of a column
sum = data[‘column1’].sum()
Calculate the count of a column

count = data[‘column1’].count()
Calculate the median of a column

median = data[‘column1’].median()

In this example, we have used various functions to calculate the mean, sum, count, and median of a column in the DataFrame.

In this post, we have covered the basics of data manipulation with Pandas, including reading data into a DataFrame, selecting and filtering data, sorting data, and aggregating data. Pandas provides a powerful and flexible toolset for working with structured data in Python, and is widely used in data science, machine learning, and other fields. By the end of this post, you should have a solid understanding of the basics of data manipulation with Pandas, which will enable you to start working with data in Python more effectively.

Reading Data into a DataFrame

Selecting and Filtering Data

Sorting Data

Aggregating Data

Follow