Data Manipulation with Pandas

Pandas is a popular Python library for data manipulation and analysis. It provides powerful data structures for working with structured data, including Series (one-dimensional labeled arrays) and DataFrame (two-dimensional labeled data tables). In this post, we will cover the basics of data manipulation with Pandas, including reading data into a DataFrame, selecting and filtering data, sorting data, and aggregating data.

Reading Data into a DataFrame

The first step in data manipulation with Pandas is to read data into a DataFrame. Pandas provides several functions for reading data from various sources, including CSV files, Excel files, SQL databases, and more. Here’s an example of how to read data from a CSV file into a DataFrame:

kotlin
import pandas as pd

data = pd.read_csv('data.csv')

In this example, we have imported Pandas and used the read_csv function to read data from a CSV file named “data.csv” into a DataFrame named “data”. We can then use this DataFrame to manipulate and analyze the data.

Selecting and Filtering Data

Once we have data in a DataFrame, we can select and filter the data based on various criteria. Pandas provides several functions for selecting and filtering data, including loc, iloc, and boolean indexing.

The loc function allows us to select data based on row labels and column names. Here’s an example:

sql
# Select a single row
row = data.loc[0]

# Select a single column
column = data['column_name']

# Select multiple rows and columns
subset = data.loc[[0, 1, 2], ['column1', 'column2']]

In this example, we have used the loc function to select a single row, a single column, and a subset of rows and columns from the DataFrame.

The iloc function allows us to select data based on row and column indices. Here’s an example:

makefile
# Select a single row
row = data.iloc[0]

# Select a single column
column = data.iloc[:, 0]

# Select multiple rows and columns
subset = data.iloc[[0, 1, 2], [0, 1]]

In this example, we have used the iloc function to select a single row, a single column, and a subset of rows and columns from the DataFrame.

Boolean indexing allows us to filter data based on a condition. Here’s an example:

kotlin
# Filter rows where column1 > 0
filtered_data = data[data['column1'] > 0]

In this example, we have used boolean indexing to filter the rows in the DataFrame where the value of “column1” is greater than 0.

Sorting Data

Pandas also provides functions for sorting data in a DataFrame. We can sort data by one or more columns, in ascending or descending order. Here’s an example:

kotlin
# Sort data by a single column
sorted_data = data.sort_values('column1')

# Sort data by multiple columns
sorted_data = data.sort_values(['column1', 'column2'])

# Sort data in descending order
sorted_data = data.sort_values('column1', ascending=False)

In this example, we have used the sort_values function to sort the data in the DataFrame by one or more columns, in ascending or descending order.

Aggregating Data

Finally, Pandas provides functions for aggregating data in a DataFrame, such as calculating the mean, median, sum, count, and more. Here’s an example:

bash
# Calculate the mean of a column
mean = data['column1'].mean()

# Calculate thesum of a column
sum = data[‘column1’].sum()
Calculate the count of a column

count = data[‘column1’].count()
Calculate the median of a column

median = data[‘column1’].median()


In this example, we have used various functions to calculate the mean, sum, count, and median of a column in the DataFrame.

In this post, we have covered the basics of data manipulation with Pandas, including reading data into a DataFrame, selecting and filtering data, sorting data, and aggregating data. Pandas provides a powerful and flexible toolset for working with structured data in Python, and is widely used in data science, machine learning, and other fields. By the end of this post, you should have a solid understanding of the basics of data manipulation with Pandas, which will enable you to start working with data in Python more effectively.

Previous articleBasic Python Syntax and Data Types: variables, strings, integers, floats, lists, and dictionaries.
Next articleData Visualization with Matplotlib
Author and Assistant Professor in Finance, Ardent fan of Arsenal FC. Always believe "The only good is knowledge and the only evil is ignorance - Socrates"
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments