Live
Auto strategy optimizer — AI improves your edge while you sleep
guideintermediate20 min

Introduction to Pandas

Learn the basics of pandas, Python's powerful data analysis library.

Last updated: Jan 28, 2026

Pandas is a library that makes working with tabular data (like spreadsheets) easy. It's the foundation of data analysis in Python and essential for working with financial data.

Importing Pandas

python
import pandas as pd  # "pd" is the standard alias

# Now use pd.something() to access pandas functions

DataFrames: The Core Concept

A DataFrame is like a spreadsheet: rows and columns of data with labels.

python
# Create from dictionary
data = {
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "city": ["NYC", "LA", "Chicago"]
}
df = pd.DataFrame(data)
print(df)

#       name  age     city
# 0    Alice   25      NYC
# 1      Bob   30       LA
# 2  Charlie   35  Chicago

Reading Data from Files

python
# Read CSV file
df = pd.read_csv("data.csv")

# Read Excel file
df = pd.read_excel("data.xlsx")

# Read from URL
df = pd.read_csv("https://example.com/data.csv")

# Quick look at the data
print(df.head())      # First 5 rows
print(df.tail())      # Last 5 rows
print(df.shape)       # (rows, columns)
print(df.columns)     # Column names
print(df.info())      # Data types and missing values
print(df.describe())  # Statistics for numeric columns

Selecting Data

python
# Select a column (returns Series)
ages = df["age"]
print(ages)

# Select multiple columns (returns DataFrame)
subset = df[["name", "age"]]

# Select rows by index
first_row = df.iloc[0]      # By position
first_three = df.iloc[0:3]  # Slice by position

# Select rows by label
row = df.loc[0]  # By label (same as iloc if using default index)

# Select specific cell
value = df.iloc[0, 1]  # Row 0, column 1
value = df.loc[0, "age"]  # Row 0, column "age"

Filtering Data

python
# Filter rows based on condition
adults = df[df["age"] >= 18]
print(adults)

# Multiple conditions (use & for AND, | for OR)
young_nyc = df[(df["age"] < 30) & (df["city"] == "NYC")]

# Filter using isin()
cities = ["NYC", "LA"]
filtered = df[df["city"].isin(cities)]

# Filter using string methods
df[df["name"].str.startswith("A")]
df[df["name"].str.contains("li")]

Basic Operations

python
# Math on columns
df["age_in_months"] = df["age"] * 12

# Aggregate functions
print(df["age"].mean())    # Average
print(df["age"].sum())     # Total
print(df["age"].min())     # Minimum
print(df["age"].max())     # Maximum
print(df["age"].std())     # Standard deviation

# Value counts
print(df["city"].value_counts())
# NYC       1
# LA        1
# Chicago   1

# Sort
df_sorted = df.sort_values("age", ascending=False)

Handling Missing Data

python
# Check for missing values
print(df.isna().sum())  # Count NaN per column

# Drop rows with any missing values
df_clean = df.dropna()

# Fill missing values
df_filled = df.fillna(0)  # Fill with 0
df_filled = df.fillna(df["age"].mean())  # Fill with mean

# Fill forward/backward
df_filled = df.fillna(method="ffill")  # Forward fill

Grouping Data

python
# Group by one column
grouped = df.groupby("city")["age"].mean()
print(grouped)
# city
# Chicago    35
# LA         30
# NYC        25

# Multiple aggregations
stats = df.groupby("city")["age"].agg(["mean", "min", "max", "count"])
print(stats)

Saving Data

python
# Save to CSV
df.to_csv("output.csv", index=False)

# Save to Excel
df.to_excel("output.xlsx", index=False)

Practice

  1. Load a CSV file and display basic statistics
  2. Filter a dataset to show only rows matching certain criteria
  3. Calculate the average value per category using groupby
  4. Handle missing values by filling them with the column mean
  5. Create a new column based on calculations from existing columns

Tags

pandasdataframedata-analysislibrary
Related documentation