Retail Sales Data Analysis: Customer, Product, and Transaction Analytics

SQL-based retail data analysis project focusing on customer behavior, product performance, sales trends, and revenue insights using real-world transaction data.

View on GitHub →
Retail Sales Data Analysis: Customer, Product, and Transaction Analytics

Overview

In today’s competitive retail market, understanding customer behavior, product performance, and sales trends is critical for business growth. In this case study, SQL is used to analyze retail data from three tables:

  • Customer — Contains customer demographics
  • Transactions — Tracks purchase details and returns
  • Product — Holds product category and sub-category information

Business Questions & SQL Solutions

Q1. What is the total number of rows in each table?

SELECT 'Customer' AS table_name, COUNT(*) AS row_count FROM customer
UNION ALL
SELECT 'Transactions', COUNT(*) FROM transactions
UNION ALL
SELECT 'Product', COUNT(*) FROM prod_cat_info;

Q2. How many transactions are returns?

SELECT COUNT(*) AS return_count
FROM transactions
WHERE qty < 0;

Q3. What is the time range of the transaction data?

SELECT
  MIN(tran_date) AS earliest_date,
  MAX(tran_date) AS latest_date
FROM transactions;

Q4. Which channel is most frequently used for transactions?

SELECT Store_type, COUNT(*) AS transaction_count
FROM transactions
GROUP BY Store_type
ORDER BY transaction_count DESC;

Q5. Find top 5 product subcategories by sales & return percentage

SELECT TOP 5
  p.prod_subcat,
  SUM(CASE WHEN t.qty > 0 THEN t.total_amt ELSE 0 END) AS total_sales,
  ROUND(
    100.0 * SUM(CASE WHEN t.qty < 0 THEN 1 ELSE 0 END) / COUNT(*), 2
  ) AS return_pct
FROM transactions t
JOIN prod_cat_info p
  ON t.prod_cat_code = p.prod_cat_code
  AND t.prod_subcat_code = p.prod_sub_cat_code
GROUP BY p.prod_subcat
ORDER BY total_sales DESC;

Key Findings

  • Electronics and Clothing are the top revenue-generating categories.
  • Flagship stores account for the highest combined revenue across categories.
  • A small segment of customers (those with 10+ purchases) drives a disproportionate share of revenue — a classic Pareto pattern.
  • Return rates vary significantly by sub-category, suggesting quality or expectation mismatches in certain product lines.

Conclusion

SQL remains one of the most powerful tools for retail analytics. By structuring business questions as queries, this project demonstrates how raw transaction data can be transformed into actionable insights without any external BI tool.