ZMedia Purwodadi

5 Python Projects for Beginner Data Analysts to Build Their Portfolio in 2026 (With Code)

Table of Contents

Reading tutorials is a great way to start learning Python. But at some point, you have to close the tutorial and build something yourself.

That is the moment most beginners dread. And it is also the moment that separates people who learn Python from people who get hired with it.

Many analysts land their first role without a single line of professional experience on their resume because their projects did the talking instead. A well-built portfolio of data analyst projects can help you close the gap between "I am learning" and "I am ready to work."

Most entry-level and early-career data analysts should aim for three to five well-developed projects rather than many small ones. Each project should showcase a different skill, such as exploratory analysis, visualization, forecasting, or business decision-making. Quality matters more than quantity, especially if you can explain your thinking deeply.

In this article, you will find 5 beginner-friendly Python projects with starter code you can run today. Each one is built around a Nigerian context so the data feels real and relevant. Each one also ties directly to the Python skills covered in our earlier tutorials on Pandas, NumPy, EDA, and data visualization.

Prerequisites: You should be comfortable with basic Python and Pandas before starting these projects. If you need a refresher, read our guide on 5 Steps to Perform Exploratory Data Analysis in Python and 5 NumPy Functions Every Data Analyst Should Know first.

5 Python Projects for Beginner Data Analysts to Build Their Portfolio in 2026 (With Code)

Project 1: Student Grade Analyzer

What You Will Build

A Python script that reads a list of student names and their exam scores, calculates their averages, assigns letter grades, identifies the top and bottom performers, and produces a summary chart.

This is one of the most beginner-friendly projects you can start with because the data is simple and the goal is very clear.

What You Will Learn

  • Loading data into a Pandas DataFrame
  • Using NumPy to calculate mean, median, and standard deviation
  • Writing conditional logic with np.where() to assign grades
  • Creating a bar chart with Matplotlib

Skills Covered

Pandas, NumPy, Matplotlib, Conditional Logic

Starter Code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Step 1: Create the dataset
data = {
    "Student": ["Ada", "Emeka", "Fatima", "Chidi", "Ngozi",
                 "Tunde", "Amara", "Bello", "Grace", "Uche"],
    "Maths":   [85, 92, 78, 65, 88, 95, 60, 73, 90, 55],
    "English": [76, 88, 82, 91, 70, 84, 65, 79, 87, 72],
    "Science": [90, 85, 79, 88, 76, 92, 68, 80, 85, 62]
}

df = pd.DataFrame(data)

# Step 2: Calculate each student's average score
df["Average"] = df[["Maths", "English", "Science"]].mean(axis=1).round(1)

# Step 3: Assign letter grades
df["Grade"] = np.where(df["Average"] >= 80, "A",
              np.where(df["Average"] >= 70, "B",
              np.where(df["Average"] >= 60, "C", "F")))

# Step 4: Sort by average score
df = df.sort_values("Average", ascending=False).reset_index(drop=True)
print(df[["Student", "Average", "Grade"]])

# Step 5: Visualise the results
plt.figure(figsize=(10, 5))
colors = ["gold" if g == "A" else "steelblue" for g in df["Grade"]]
plt.bar(df["Student"], df["Average"], color=colors)
plt.axhline(y=df["Average"].mean(), color="red", linestyle="--",
            label=f"Class Average: {df['Average'].mean():.1f}")
plt.title("Student Performance Summary")
plt.xlabel("Student")
plt.ylabel("Average Score (%)")
plt.legend()
plt.tight_layout()
plt.show()

How to Extend This Project

Once the basic version is working, push yourself further. Add a pass or fail column. Calculate the class rank for each student. Export the final DataFrame as a CSV file using df.to_csv("results.csv", index=False). These small additions make your project significantly more impressive to a recruiter or client.

Project 2: Nigerian Market Price Tracker

What You Will Build

A tool that tracks the prices of common food items across five Nigerian cities, compares the prices, identifies the cheapest and most expensive cities for each item, and produces a grouped bar chart.

This project teaches you one of the most valuable real-world data analysis skills: comparing grouped data across multiple categories.

What You Will Learn

  • Organising and reshaping data in Pandas
  • Grouping and aggregating data with groupby()
  • Comparing categories visually using Seaborn
  • Finding minimum and maximum values across columns

Skills Covered

Pandas, Seaborn, GroupBy, Aggregation, Data Comparison

Starter Code

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Step 1: Create the dataset (prices in Naira per kg or unit)
data = {
    "City":   ["Lagos", "Abuja", "Kano", "Port Harcourt", "Ibadan"] * 4,
    "Item":   ["Rice"] * 5 + ["Tomatoes"] * 5 + ["Garri"] * 5 + ["Beans"] * 5,
    "Price":  [
        1800, 1950, 1700, 1900, 1650,   # Rice
        800,  900,  700,  850,  750,    # Tomatoes
        600,  650,  580,  630,  570,    # Garri
        1200, 1350, 1100, 1250, 1100    # Beans
    ]
}

df = pd.DataFrame(data)

# Step 2: Show the average price per item across all cities
avg_prices = df.groupby("Item")["Price"].mean().sort_values(ascending=False)
print("Average Prices (Naira):")
print(avg_prices.round(0))

# Step 3: Find the cheapest city for each item
cheapest = df.loc[df.groupby("Item")["Price"].idxmin()][["Item", "City", "Price"]]
print("\nCheapest City Per Item:")
print(cheapest)

# Step 4: Visualise price comparison across cities
plt.figure(figsize=(12, 6))
sns.barplot(data=df, x="Item", y="Price", hue="City", palette="Set2")
plt.title("Food Price Comparison Across Nigerian Cities (₦)")
plt.xlabel("Food Item")
plt.ylabel("Price (Naira)")
plt.legend(title="City", bbox_to_anchor=(1.05, 1), loc="upper left")
plt.tight_layout()
plt.show()

How to Extend This Project

Try adding a "Month" column and tracking how prices change over several months. This turns a simple price comparison into a time series analysis, which is a much more advanced and impressive skill to demonstrate.

Project 3: Personal Expense Tracker

What You Will Build

An expense tracker that reads a month of personal spending data, breaks spending down by category, calculates what percentage of income went to each area, and produces a clear pie chart and category summary.

This project is valuable because it solves a real personal finance problem, which means you can explain it clearly in an interview or to a client.

What You Will Learn

  • Filtering data with Pandas
  • Calculating percentages from raw numbers
  • Grouping transactions by category
  • Creating pie charts and summary statistics

Skills Covered

Pandas, Matplotlib, Filtering, Percentage Calculations, Grouping

Starter Code

import pandas as pd
import matplotlib.pyplot as plt

# Step 1: Create the dataset
data = {
    "Date": pd.date_range(start="2026-05-01", periods=20, freq="D"),
    "Category": [
        "Food", "Transport", "Food", "Entertainment", "Bills",
        "Food", "Transport", "Clothing", "Food", "Bills",
        "Entertainment", "Food", "Transport", "Bills", "Food",
        "Clothing", "Transport", "Food", "Entertainment", "Bills"
    ],
    "Amount": [
        4500, 1200, 3800, 5000, 15000,
        3200, 900, 12000, 4100, 8500,
        3500, 2900, 1100, 7000, 5200,
        9000, 800, 3600, 2000, 14000
    ]
}

df = pd.DataFrame(data)

# Step 2: Total spending by category
category_totals = df.groupby("Category")["Amount"].sum().sort_values(ascending=False)
print("Spending by Category (Naira):")
print(category_totals)

# Step 3: What percentage of total spending is each category?
total_spent = df["Amount"].sum()
print(f"\nTotal Spent: ₦{total_spent:,}")
category_pct = (category_totals / total_spent * 100).round(1)
print("\nPercentage of Total Spending:")
print(category_pct)

# Step 4: Pie chart
plt.figure(figsize=(8, 8))
plt.pie(
    category_totals,
    labels=category_totals.index,
    autopct="%1.1f%%",
    startangle=140,
    colors=["#FF6B6B", "#4ECDC4", "#45B7D1", "#96CEB4", "#FFEAA7"]
)
plt.title("May 2026 Expense Breakdown")
plt.show()

# Step 5: Identify the biggest single expense
biggest = df.loc[df["Amount"].idxmax()]
print(f"\nBiggest single expense: ₦{biggest['Amount']:,} on {biggest['Category']}")

How to Extend This Project

Add a monthly budget for each category and compare actual spending against the budget. Calculate overspending or savings for each area. This makes the project relevant for business reporting, not just personal finance.

Project 4: Customer Survey Analyzer

What You Will Build

A full exploratory data analysis of a customer satisfaction survey. The project reads survey responses, cleans the data, summarises ratings by category, identifies patterns in customer complaints, and produces a heatmap of satisfaction scores.

Hands-on projects are the single most powerful way to learn data analytics and get noticed by recruiters in 2026. Projects force you to apply tools to real problems, build storytelling instincts, and create tangible portfolio pieces you can link on your resume or GitHub.

This project demonstrates that storytelling skill more than any other on this list.

What You Will Learn

  • Performing a full EDA workflow on survey data
  • Identifying missing values and handling them cleanly
  • Using value_counts() to summarise categorical responses
  • Building a correlation heatmap with Seaborn

Skills Covered

Pandas, Seaborn, EDA, Missing Value Handling, Correlation Analysis

Starter Code

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# Step 1: Create the dataset (survey scores out of 10)
np.random.seed(42)
n = 50

df = pd.DataFrame({
    "Customer_ID": range(1, n + 1),
    "Product_Quality":   np.random.randint(5, 11, n),
    "Delivery_Speed":    np.random.randint(3, 11, n),
    "Customer_Service":  np.random.randint(4, 11, n),
    "Value_for_Money":   np.random.randint(4, 11, n),
    "Overall_Rating":    np.random.randint(4, 11, n),
    "Would_Recommend":   np.random.choice(["Yes", "No", "Maybe"], n)
})

# Introduce a few missing values to make it realistic
df.loc[[5, 12, 28], "Delivery_Speed"] = np.nan
df.loc[[3, 19], "Customer_Service"] = np.nan

# Step 2: Data quality check
print("Missing values:\n", df.isnull().sum())
df.fillna(df.mean(numeric_only=True), inplace=True)

# Step 3: Average score per category
score_cols = ["Product_Quality", "Delivery_Speed", "Customer_Service",
              "Value_for_Money", "Overall_Rating"]
avg_scores = df[score_cols].mean().sort_values(ascending=False).round(2)
print("\nAverage Satisfaction Scores:")
print(avg_scores)

# Step 4: Recommendation breakdown
print("\nWould Recommend?")
print(df["Would_Recommend"].value_counts())

# Step 5: Correlation heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(df[score_cols].corr(), annot=True, fmt=".2f",
            cmap="YlGnBu", linewidths=0.5)
plt.title("Customer Satisfaction Correlation Heatmap")
plt.tight_layout()
plt.show()

How to Extend This Project

Filter the data to compare scores from customers who said "Yes" versus "No" to the recommendation question. This kind of segmented comparison is exactly what business analysts do in real companies.

Project 5: Sales Performance Dashboard

What You Will Build

A multi-chart sales analysis that reads six months of sales data, compares performance across product categories and regions, identifies trends, flags the best and worst performing months, and presents everything in a clean four-panel dashboard.

A strong project shows you can translate messy data into business insights, design clear visualizations, and tell a story a non-technical audience can understand. This project does exactly that.

This is the most complete and impressive project on this list. It ties together everything from our EDA guide, our NumPy functions article, and our data visualization guide into one coherent output.

What You Will Learn

  • Building multi-panel dashboards with Matplotlib subplots
  • Grouping and aggregating sales data by time period
  • Identifying monthly trends with line charts
  • Comparing regional performance with bar and box plots

Skills Covered

Pandas, NumPy, Matplotlib, Seaborn, Subplots, GroupBy, Time Series

Starter Code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Step 1: Create six months of sales data
np.random.seed(10)
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun"]
regions = ["Lagos", "Abuja", "Kano", "PHC"]
categories = ["Electronics", "Clothing", "Food", "Furniture"]

rows = []
for month in months:
    for region in regions:
        for category in categories:
            rows.append({
                "Month": month,
                "Region": region,
                "Category": category,
                "Revenue": np.random.randint(200000, 900000),
                "Units_Sold": np.random.randint(50, 500)
            })

df = pd.DataFrame(rows)

# Step 2: Summary statistics
print("Total Revenue by Region (₦):")
print(df.groupby("Region")["Revenue"].sum().sort_values(ascending=False))

# Step 3: Build a four-panel dashboard
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle("ShopNG Sales Performance Dashboard (Jan–Jun 2026)",
             fontsize=14, fontweight="bold")

# Panel 1: Monthly revenue trend
monthly_rev = df.groupby("Month")["Revenue"].sum().reindex(months)
axes[0, 0].plot(months, monthly_rev / 1_000_000, marker="o", color="steelblue", linewidth=2)
axes[0, 0].set_title("Monthly Revenue Trend")
axes[0, 0].set_ylabel("Revenue (₦ Millions)")
axes[0, 0].set_xlabel("Month")

# Panel 2: Revenue by Region
region_rev = df.groupby("Region")["Revenue"].sum().sort_values()
axes[0, 1].barh(region_rev.index, region_rev / 1_000_000, color="coral")
axes[0, 1].set_title("Total Revenue by Region")
axes[0, 1].set_xlabel("Revenue (₦ Millions)")

# Panel 3: Revenue by Category
cat_rev = df.groupby("Category")["Revenue"].sum().sort_values(ascending=False)
axes[1, 0].bar(cat_rev.index, cat_rev / 1_000_000, color="mediumseagreen")
axes[1, 0].set_title("Revenue by Product Category")
axes[1, 0].set_ylabel("Revenue (₦ Millions)")
axes[1, 0].tick_params(axis="x", rotation=15)

# Panel 4: Units sold distribution by region
sns.boxplot(data=df, x="Region", y="Units_Sold", palette="Set3", ax=axes[1, 1])
axes[1, 1].set_title("Units Sold Distribution by Region")
axes[1, 1].set_ylabel("Units Sold")

plt.tight_layout()
plt.show()

# Step 4: Best and worst month
best_month = monthly_rev.idxmax()
worst_month = monthly_rev.idxmin()
print(f"\nBest month:  {best_month} (₦{monthly_rev[best_month]:,.0f})")
print(f"Worst month: {worst_month} (₦{monthly_rev[worst_month]:,.0f})")

How to Extend This Project

Add a profit column by subtracting a cost from each revenue figure. Calculate the profit margin per category and add a fifth panel showing which product category is the most profitable. This transforms the project from a sales report into a full business intelligence dashboard.

How to Share These Projects on GitHub

Building the project is only half the work. Making it visible to employers and clients matters just as much.

Here is a simple process to follow once your project is ready:

Step 1: Create a free account on GitHub if you do not already have one.

Step 2: Create a new repository for each project with a clear name like student-grade-analyzer-python or nigerian-market-price-tracker.

Step 3: Upload your Jupyter Notebook (.ipynb file) and any datasets you used.

Step 4: Write a short README file that explains what the project does, what tools it uses, and what insights it produced. Three to five sentences is enough.

Step 5: Add the GitHub link to your LinkedIn profile, your CV, and any job applications you send.

A smaller set of strong projects is easier to discuss in interviews. You do not need all five projects from this article before you start applying. Two or three well-documented, clearly explained projects are enough to open doors.

Conclusion

The gap between learning Python and using Python professionally is not technical. It is practical. Employers do not hire people who have read tutorials. They hire people who have built things.

Start with Project 1 today. It is small enough to finish in an afternoon and big enough to teach you something meaningful. Then move through the list in order. By the time you complete all five, you will have a portfolio that shows range, real-world thinking, and the ability to work with Nigerian business data, which is a genuine competitive advantage for roles in Nigerian companies and remote positions that serve African markets.

If you want structured, step-by-step guidance that takes you from Python basics all the way through to your first real data analysis project, check out our complete course at jacobisah.selar.com. It is built specifically for beginners and intermediate learners who want to move from tutorials to real work.

Related Articles on This Blog

References

  1. 20 Best Data Analyst Projects for 2026 — Scaler — scaler.com
  2. 20 Data Analyst Projects to Build Your Portfolio — Dataquest — dataquest.io
  3. 40 Data Analytics Datasets and Project Ideas 2026 — Interview Query — interviewquery.com
  4. 30 Data Analytics Projects for All Levels — DataCamp — datacamp.com
  5. Pandas Official Documentation — pandas.pydata.org
  6. NumPy Official Documentation — numpy.org
  7. Seaborn Official Documentation — seaborn.pydata.org

Published on JacobIsah Programming Hub | enemzy.blogspot.com

Post a Comment