This project demonstrates my SQL problem-solving skills through the analysis of data for Zomato, a popular food delivery company in India. The project involves setting up the database, importing data, handling null values, and solving a variety of business problems using complex SQL queries.
- Database Setup: Creation of the
zomato_db
database and the required tables. - Data Import: Inserting sample data into the tables.
- Data Cleaning: Handling null values and ensuring data integrity.
- Business Problems: Solving 20 specific business problems using SQL queries.
CREATE DATABASE zomato_db;
DROP TABLE IF EXISTS deliveries;
DROP TABLE IF EXISTS Orders;
DROP TABLE IF EXISTS customers;
DROP TABLE IF EXISTS restaurants;
DROP TABLE IF EXISTS riders;
-- 2. Creating Tables
CREATE TABLE restaurants (
restaurant_id SERIAL PRIMARY KEY,
restaurant_name VARCHAR(100) NOT NULL,
city VARCHAR(50),
opening_hours VARCHAR(50)
);
CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
customer_name VARCHAR(100) NOT NULL,
reg_date DATE
);
CREATE TABLE riders (
rider_id SERIAL PRIMARY KEY,
rider_name VARCHAR(100) NOT NULL,
sign_up DATE
);
CREATE TABLE Orders (
order_id SERIAL PRIMARY KEY,
customer_id INT,
restaurant_id INT,
order_item VARCHAR(255),
order_date DATE NOT NULL,
order_time TIME NOT NULL,
order_status VARCHAR(20) DEFAULT 'Pending',
total_amount DECIMAL(10, 2) NOT NULL,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
FOREIGN KEY (restaurant_id) REFERENCES restaurants(restaurant_id)
);
CREATE TABLE deliveries (
delivery_id SERIAL PRIMARY KEY,
order_id INT,
delivery_status VARCHAR(20) DEFAULT 'Pending',
delivery_time TIME,
rider_id INT,
FOREIGN KEY (order_id) REFERENCES Orders(order_id),
FOREIGN KEY (rider_id) REFERENCES riders(rider_id)
);
Before performing analysis, I ensured that the data was clean and free from null values where necessary. For instance:
UPDATE orders
SET total_amount = COALESCE(total_amount, 0);
1. Write a query to find the top 5 most frequently ordered dishes by customer called "Arjun Mehta" in the last 1 year.
SELECT
customer_name,
dishes,
total_orders
FROM -- table name
(SELECT
c.customer_id,
c.customer_name,
o.order_item as dishes,
COUNT(*) as total_orders,
DENSE_RANK() OVER(ORDER BY COUNT(*) DESC) as rank
FROM orders as o
JOIN
customers as c
ON c.customer_id = o.customer_id
WHERE
o.order_date >= CURRENT_DATE - INTERVAL '1 Year'
AND
c.customer_name = 'Arjun Mehta'
GROUP BY 1, 2, 3
ORDER BY 1, 4 DESC) as t1
WHERE rank <= 5
-- Question: Identify the time slots during which the most orders are placed. based on 2-hour intervals.
Approach 1:
-- Approach 1
SELECT
FLOOR(EXTRACT(HOUR FROM order_time)/2)*2 as start_time,
FLOOR(EXTRACT(HOUR FROM order_time)/2)*2 + 2 as end_time,
COUNT(*) as total_orders
FROM orders
GROUP BY 1, 2
ORDER BY 3 DESC;
Approach 2:
SELECT
CASE
WHEN EXTRACT(HOUR FROM order_time) BETWEEN 0 AND 1 THEN '00:00 - 02:00'
WHEN EXTRACT(HOUR FROM order_time) BETWEEN 2 AND 3 THEN '02:00 - 04:00'
WHEN EXTRACT(HOUR FROM order_time) BETWEEN 4 AND 5 THEN '04:00 - 06:00'
WHEN EXTRACT(HOUR FROM order_time) BETWEEN 6 AND 7 THEN '06:00 - 08:00'
WHEN EXTRACT(HOUR FROM order_time) BETWEEN 8 AND 9 THEN '08:00 - 10:00'
WHEN EXTRACT(HOUR FROM order_time) BETWEEN 10 AND 11 THEN '10:00 - 12:00'
WHEN EXTRACT(HOUR FROM order_time) BETWEEN 12 AND 13 THEN '12:00 - 14:00'
WHEN EXTRACT(HOUR FROM order_time) BETWEEN 14 AND 15 THEN '14:00 - 16:00'
WHEN EXTRACT(HOUR FROM order_time) BETWEEN 16 AND 17 THEN '16:00 - 18:00'
WHEN EXTRACT(HOUR FROM order_time) BETWEEN 18 AND 19 THEN '18:00 - 20:00'
WHEN EXTRACT(HOUR FROM order_time) BETWEEN 20 AND 21 THEN '20:00 - 22:00'
WHEN EXTRACT(HOUR FROM order_time) BETWEEN 22 AND 23 THEN '22:00 - 00:00'
END AS time_slot,
COUNT(order_id) AS order_count
FROM Orders
GROUP BY time_slot
ORDER BY order_count DESC;
-- Question: Find the average order value per customer who has placed more than 750 orders. -- Return customer_name, and aov(average order value)
SELECT
-- o.customer_id,
c.customer_name,
AVG(o.total_amount) as aov
FROM orders as o
JOIN customers as c
ON c.customer_id = o.customer_id
GROUP BY 1
HAVING COUNT(order_id) > 750;
-- Question: List the customers who have spent more than 100K in total on food orders. -- return customer_name, and customer_id!
SELECT
-- o.customer_id,
c.customer_name,
SUM(o.total_amount) as total_spent
FROM orders as o
JOIN customers as c
ON c.customer_id = o.customer_id
GROUP BY 1
HAVING SUM(o.total_amount) > 100000;
-- Question: Write a query to find orders that were placed but not delivered. -- Return each restuarant name, city and number of not delivered orders
-- Approach 1
SELECT
r.restaurant_name,
COUNT(o.order_id) as cnt_not_delivered_orders
FROM orders as o
LEFT JOIN
restaurants as r
ON r.restaurant_id = o.restaurant_id
LEFT JOIN
deliveries as d
ON d.order_id = o.order_id
WHERE d.delivery_id IS NULL
GROUP BY 1
ORDER BY 2 DESC
-- Approach 2
SELECT
r.restaurant_name,
COUNT(*)
FROM orders as o
LEFT JOIN
restaurants as r
ON r.restaurant_id = o.restaurant_id
WHERE
o.order_id NOT IN (SELECT order_id FROM deliveries)
GROUP BY 1
ORDER BY 2 DESC
-- Rank restaurants by their total revenue from the last year, including their name, -- total revenue, and rank within their city.
WITH ranking_table
AS
(
SELECT
r.city,
r.restaurant_name,
SUM(o.total_amount) as revenue,
RANK() OVER(PARTITION BY r.city ORDER BY SUM(o.total_amount) DESC) as rank
FROM orders as o
JOIN
restaurants as r
ON r.restaurant_id = o.restaurant_id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY 1, 2
)
SELECT
*
FROM ranking_table
WHERE rank = 1;
-- Identify the most popular dish in each city based on the number of orders.
SELECT *
FROM
(SELECT
r.city,
o.order_item as dish,
COUNT(order_id) as total_orders,
RANK() OVER(PARTITION BY r.city ORDER BY COUNT(order_id) DESC) as rank
FROM orders as o
JOIN
restaurants as r
ON r.restaurant_id = o.restaurant_id
GROUP BY 1, 2
) as t1
WHERE rank = 1;
-- Find customers who haven’t placed an order in 2024 but did in 2023.
SELECT DISTINCT customer_id FROM orders
WHERE
EXTRACT(YEAR FROM order_date) = 2023
AND
customer_id NOT IN
(SELECT DISTINCT customer_id FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2024);
-- Calculate and compare the order cancellation rate for each restaurant between the -- current year and the previous year.
WITH cancel_ratio_23 AS (
SELECT
o.restaurant_id,
COUNT(o.order_id) AS total_orders,
COUNT(CASE WHEN d.delivery_id IS NULL THEN 1 END) AS not_delivered
FROM orders AS o
LEFT JOIN deliveries AS d
ON o.order_id = d.order_id
WHERE EXTRACT(YEAR FROM o.order_date) = 2023
GROUP BY o.restaurant_id
),
cancel_ratio_24 AS (
SELECT
o.restaurant_id,
COUNT(o.order_id) AS total_orders,
COUNT(CASE WHEN d.delivery_id IS NULL THEN 1 END) AS not_delivered
FROM orders AS o
LEFT JOIN deliveries AS d
ON o.order_id = d.order_id
WHERE EXTRACT(YEAR FROM o.order_date) = 2024
GROUP BY o.restaurant_id
),
last_year_data AS (
SELECT
restaurant_id,
total_orders,
not_delivered,
ROUND((not_delivered::numeric / total_orders::numeric) * 100, 2) AS cancel_ratio
FROM cancel_ratio_23
),
current_year_data AS (
SELECT
restaurant_id,
total_orders,
not_delivered,
ROUND((not_delivered::numeric / total_orders::numeric) * 100, 2) AS cancel_ratio
FROM cancel_ratio_24
)
SELECT
c.restaurant_id AS restaurant_id,
c.cancel_ratio AS current_year_cancel_ratio,
l.cancel_ratio AS last_year_cancel_ratio
FROM current_year_data AS c
JOIN last_year_data AS l
ON c.restaurant_id = l.restaurant_id;
-- Determine each rider's average delivery time.
SELECT
o.order_id,
o.order_time,
d.delivery_time,
d.rider_id,
d.delivery_time - o.order_time AS time_difference,
EXTRACT(EPOCH FROM (d.delivery_time - o.order_time +
CASE WHEN d.delivery_time < o.order_time THEN INTERVAL '1 day' ELSE
INTERVAL '0 day' END))/60 as time_difference_insec
FROM orders AS o
JOIN deliveries AS d
ON o.order_id = d.order_id
WHERE d.delivery_status = 'Delivered';
-- Calculate each restaurant's growth ratio based on the total number of delivered orders since its joining
WITH growth_ratio
AS
(
SELECT
o.restaurant_id,
EXTRACT(YEAR FROM o.order_date) as year,
EXTRACT(MONTH FROM o.order_date) as month,
COUNT(o.order_id) as cr_month_orders,
LAG(COUNT(o.order_id), 1) OVER(PARTITION BY o.restaurant_id ORDER BY EXTRACT(YEAR FROM o.order_date),
EXTRACT(MONTH FROM o.order_date)) as prev_month_orders
FROM orders as o
JOIN
deliveries as d
ON o.order_id = d.order_id
WHERE d.delivery_status = 'Delivered'
GROUP BY 1, 2, 3
ORDER BY 1, 2
)
SELECT
restaurant_id,
month,
prev_month_orders,
cr_month_orders,
ROUND(
(cr_month_orders::numeric-prev_month_orders::numeric)/prev_month_orders::numeric * 100
,2)
as growth_ratio
FROM growth_ratio;
-- Customer Segmentation: Segment customers into 'Gold' or 'Silver' groups based on their total spending -- compared to the average order value (AOV). If a customer's total spending exceeds the AOV, -- label them as 'Gold'; otherwise, label them as 'Silver'. Write an SQL query to determine each segment's -- total number of orders and total revenue
SELECT
cx_category,
SUM(total_orders) as total_orders,
SUM(total_spent) as total_revenue
FROM
(SELECT
customer_id,
SUM(total_amount) as total_spent,
COUNT(order_id) as total_orders,
CASE
WHEN SUM(total_amount) > (SELECT AVG(total_amount) FROM orders) THEN 'Gold'
ELSE 'silver'
END as cx_category
FROM orders
group by 1
) as t1
GROUP BY 1;
-- Calculate each rider's total monthly earnings, assuming they earn 8% of the order amount.
SELECT
d.rider_id,
TO_CHAR(o.order_date, 'mm-yy') as month,
SUM(total_amount) as revenue,
SUM(total_amount)* 0.08 as riders_earning
FROM orders as o
JOIN deliveries as d
ON o.order_id = d.order_id
GROUP BY 1, 2
ORDER BY 1, 2;
-- Find the number of 5-star, 4-star, and 3-star ratings each rider has. -- riders receive this rating based on delivery time. -- If orders are delivered less than 15 minutes of order received time the rider get 5 star rating, -- if they deliver 15 and 20 minute they get 4 star rating -- if they deliver after 20 minute they get 3 star rating.
SELECT
rider_id,
stars,
COUNT(*) as total_stars
FROM
(
SELECT
rider_id,
delivery_took_time,
CASE
WHEN delivery_took_time < 15 THEN '5 star'
WHEN delivery_took_time BETWEEN 15 AND 20 THEN '4 star'
ELSE '3 star'
END as stars
FROM
(
SELECT
o.order_id,
o.order_time,
d.delivery_time,
EXTRACT(EPOCH FROM (d.delivery_time - o.order_time +
CASE WHEN d.delivery_time < o.order_time THEN INTERVAL '1 day'
ELSE INTERVAL '0 day' END
))/60 as delivery_took_time,
d.rider_id
FROM orders as o
JOIN deliveries as d
ON o.order_id = d.order_id
WHERE delivery_status = 'Delivered'
) as t1
) as t2
GROUP BY 1, 2
ORDER BY 1, 3 DESC;
-- Analyze order frequency per day of the week and identify the peak day for each restaurant.
SELECT * FROM
(
SELECT
r.restaurant_name,
-- o.order_date,
TO_CHAR(o.order_date, 'Day') as day,
COUNT(o.order_id) as total_orders,
RANK() OVER(PARTITION BY r.restaurant_name ORDER BY COUNT(o.order_id) DESC) as rank
FROM orders as o
JOIN
restaurants as r
ON o.restaurant_id = r.restaurant_id
GROUP BY 1, 2
ORDER BY 1, 3 DESC
) as t1
WHERE rank = 1;
-- Calculate the total revenue generated by each customer over all their orders.
SELECT
o.customer_id,
c.customer_name,
SUM(o.total_amount) as CLV
FROM orders as o
JOIN customers as c
ON o.customer_id = c.customer_id
GROUP BY 1, 2;
-- Identify sales trends by comparing each month's total sales to the previous month.
SELECT
EXTRACT(YEAR FROM order_date) as year,
EXTRACT(MONTH FROM order_date) as month,
SUM(total_amount) as total_sale,
LAG(SUM(total_amount), 1) OVER(ORDER BY EXTRACT(YEAR FROM order_date), EXTRACT(MONTH FROM order_date)) as prev_month_sale
FROM orders
GROUP BY 1, 2;
-- Evaluate rider efficiency by determining average delivery times and identifying those with the lowest and highest averages.
WITH new_table
AS
(
SELECT
*,
d.rider_id as riders_id,
EXTRACT(EPOCH FROM (d.delivery_time - o.order_time +
CASE WHEN d.delivery_time < o.order_time THEN INTERVAL '1 day' ELSE
INTERVAL '0 day' END))/60 as time_deliver
FROM orders as o
JOIN deliveries as d
ON o.order_id = d.order_id
WHERE d.delivery_status = 'Delivered'
),
riders_time
AS
(
SELECT
riders_id,
AVG(time_deliver) avg_time
FROM new_table
GROUP BY 1
)
SELECT
MIN(avg_time),
MAX(avg_time)
FROM riders_time;
-- Track the popularity of specific order items over time and identify seasonal demand spikes.
SELECT
order_item,
seasons,
COUNT(order_id) as total_orders
FROM
(
SELECT
*,
EXTRACT(MONTH FROM order_date) as month,
CASE
WHEN EXTRACT(MONTH FROM order_date) BETWEEN 4 AND 6 THEN 'Spring'
WHEN EXTRACT(MONTH FROM order_date) > 6 AND
EXTRACT(MONTH FROM order_date) < 9 THEN 'Summer'
ELSE 'Winter'
END as seasons
FROM orders
) as t1
GROUP BY 1, 2
ORDER BY 1, 3 DESC;
SELECT
r.city,
SUM(total_amount) as total_revenue,
RANK() OVER(ORDER BY SUM(total_amount) DESC) as city_rank
FROM orders as o
JOIN
restaurants as r
ON o.restaurant_id = r.restaurant_id
GROUP BY 1;
This project highlights my ability to handle complex SQL queries and provides solutions to real-world business problems in the context of a food delivery service like Zomato. The approach taken here demonstrates a structured problem-solving methodology, data manipulation skills, and the ability to derive actionable insights from data.
All customer names and data used in this project are computer-generated using AI and random functions. They do not represent real data associated with Zomato or any other entity. This project is solely for learning and educational purposes, and any resemblance to actual persons, businesses, or events is purely coincidental.