SQL Optimization: LEFT JOIN + MAX(CASE WHEN) vs Multiple EXISTS Subqueries
The Problem
When you need to check multiple conditions across a related table, using separate EXISTS subqueries for each condition causes the database to scan the related table multiple times.
Example Scenario
We have an orders system and want to know which payment methods were used for each order:
Tables:
- orders - customer orders
- payments - payment transactions (one order can have multiple payments)
Goal: For each order, flag whether it used: - Credit card payment - PayPal payment - Gift card payment - Crypto payment
❌ SLOW WAY: Multiple EXISTS Subqueries
SELECT
o.order_id,
o.customer_name,
o.order_date,
o.total_amount,
-- Each of these scans the payments table separately!
CASE WHEN EXISTS (
SELECT 1
FROM payments p
WHERE p.order_id = o.order_id
AND p.payment_type = 'CREDIT_CARD'
) THEN 1 ELSE 0 END AS has_credit_card,
CASE WHEN EXISTS (
SELECT 1
FROM payments p
WHERE p.order_id = o.order_id
AND p.payment_type = 'PAYPAL'
) THEN 1 ELSE 0 END AS has_paypal,
CASE WHEN EXISTS (
SELECT 1
FROM payments p
WHERE p.order_id = o.order_id
AND p.payment_type = 'GIFT_CARD'
) THEN 1 ELSE 0 END AS has_gift_card,
CASE WHEN EXISTS (
SELECT 1
FROM payments p
WHERE p.order_id = o.order_id
AND p.payment_type = 'CRYPTO'
) THEN 1 ELSE 0 END AS has_crypto
FROM orders o
WHERE o.order_date >= '2024-01-01'
What Happens:
For EACH order:
Order #1001 → Scan payments table for 'CREDIT_CARD' [Scan 1]
Order #1001 → Scan payments table for 'PAYPAL' [Scan 2]
Order #1001 → Scan payments table for 'GIFT_CARD' [Scan 3]
Order #1001 → Scan payments table for 'CRYPTO' [Scan 4]
Order #1002 → Scan payments table for 'CREDIT_CARD' [Scan 5]
Order #1002 → Scan payments table for 'PAYPAL' [Scan 6]
... and so on
Total scans: 4 scans × number of orders
✅ FAST WAY: Single LEFT JOIN + MAX(CASE WHEN)
SELECT
o.order_id,
o.customer_name,
o.order_date,
o.total_amount,
-- All checks happen on already-joined data
MAX(CASE WHEN p.payment_type = 'CREDIT_CARD' THEN 1 ELSE 0 END) AS has_credit_card,
MAX(CASE WHEN p.payment_type = 'PAYPAL' THEN 1 ELSE 0 END) AS has_paypal,
MAX(CASE WHEN p.payment_type = 'GIFT_CARD' THEN 1 ELSE 0 END) AS has_gift_card,
MAX(CASE WHEN p.payment_type = 'CRYPTO' THEN 1 ELSE 0 END) AS has_crypto
FROM orders o
LEFT JOIN payments p ON p.order_id = o.order_id
WHERE o.order_date >= '2024-01-01'
GROUP BY
o.order_id,
o.customer_name,
o.order_date,
o.total_amount
What Happens:
Step 1: LEFT JOIN creates combined table (ONE TIME)
orders: payments:
order_id=1001 JOIN order_id=1001, payment_type='CREDIT_CARD'
→ order_id=1001, payment_type='PAYPAL'
order_id=1002 JOIN order_id=1002, payment_type='GIFT_CARD'
→
Result table in memory:
order_id | customer_name | payment_type
---------|---------------|---------------
1001 | Alice Smith | CREDIT_CARD
1001 | Alice Smith | PAYPAL ← Same order, multiple rows
1002 | Bob Jones | GIFT_CARD
1003 | Carol Lee | NULL ← No payments yet
Step 2: GROUP BY order_id organizes rows
Group 1 (order_id=1001):
├─ row: payment_type='CREDIT_CARD'
└─ row: payment_type='PAYPAL'
Group 2 (order_id=1002):
└─ row: payment_type='GIFT_CARD'
Group 3 (order_id=1003):
└─ row: payment_type=NULL
Step 3: MAX(CASE WHEN) evaluates rows within each group
For Group 1 (order_id=1001):
rows = ['CREDIT_CARD', 'PAYPAL']
MAX(CASE WHEN payment_type = 'CREDIT_CARD' THEN 1 ELSE 0 END)
# Evaluates to: max([1, 0]) = 1
MAX(CASE WHEN payment_type = 'PAYPAL' THEN 1 ELSE 0 END)
# Evaluates to: max([0, 1]) = 1
MAX(CASE WHEN payment_type = 'GIFT_CARD' THEN 1 ELSE 0 END)
# Evaluates to: max([0, 0]) = 0
MAX(CASE WHEN payment_type = 'CRYPTO' THEN 1 ELSE 0 END)
# Evaluates to: max([0, 0]) = 0
Total scans: 1 scan (the initial LEFT JOIN)
Python Analogy
EXISTS approach (inefficient):
# Query the database multiple times for each order
for order in orders:
has_credit = db.query("SELECT 1 FROM payments WHERE order_id=? AND type='CREDIT_CARD'", order.id)
has_paypal = db.query("SELECT 1 FROM payments WHERE order_id=? AND type='PAYPAL'", order.id)
has_gift = db.query("SELECT 1 FROM payments WHERE order_id=? AND type='GIFT_CARD'", order.id)
has_crypto = db.query("SELECT 1 FROM payments WHERE order_id=? AND type='CRYPTO'", order.id)
LEFT JOIN approach (efficient):
# Join once, then work with data in memory
joined_data = db.join(orders, payments) # ONE database operation
for order_id, payment_rows in joined_data.groupby('order_id'):
# Just iterating over data already in memory
has_credit = max(1 if row.type == 'CREDIT_CARD' else 0 for row in payment_rows)
has_paypal = max(1 if row.type == 'PAYPAL' else 0 for row in payment_rows)
has_gift = max(1 if row.type == 'GIFT_CARD' else 0 for row in payment_rows)
has_crypto = max(1 if row.type == 'CRYPTO' else 0 for row in payment_rows)
Big O Complexity
EXISTS approach: - O(n × 4 × m) where n = orders, m = payments per scan - Re-scans the payments table 4 times for every single order
LEFT JOIN approach: - O(n × m) for the join + O(n × k) for aggregation where k = payments per order - Scans the payments table ONCE total
Performance Impact
For 10,000 orders with average 2 payments each:
| Method | Table Scans | Approximate Time |
|---|---|---|
| EXISTS | 40,000 scans | Very slow |
| LEFT JOIN | 1 scan | Fast |
Expected speedup: 5-10x depending on data size and indexes
Key Takeaway
LEFT JOIN fetches all the data ONCE into a combined result set. After that, the database works with data already in memory - no more table scans needed. The MAX(CASE WHEN) is just aggregating rows that are already sitting there.