Skip to content

SQL Optimization: LEFT JOIN + MAX(CASE WHEN) vs Multiple EXISTS Subqueries

The Problem

When you need to check multiple conditions across a related table, using separate EXISTS subqueries for each condition causes the database to scan the related table multiple times.

Example Scenario

We have an orders system and want to know which payment methods were used for each order:

Tables: - orders - customer orders - payments - payment transactions (one order can have multiple payments)

Goal: For each order, flag whether it used: - Credit card payment - PayPal payment - Gift card payment - Crypto payment


❌ SLOW WAY: Multiple EXISTS Subqueries

SELECT
    o.order_id,
    o.customer_name,
    o.order_date,
    o.total_amount,

    -- Each of these scans the payments table separately!
    CASE WHEN EXISTS (
        SELECT 1
        FROM payments p
        WHERE p.order_id = o.order_id
          AND p.payment_type = 'CREDIT_CARD'
    ) THEN 1 ELSE 0 END AS has_credit_card,

    CASE WHEN EXISTS (
        SELECT 1
        FROM payments p
        WHERE p.order_id = o.order_id
          AND p.payment_type = 'PAYPAL'
    ) THEN 1 ELSE 0 END AS has_paypal,

    CASE WHEN EXISTS (
        SELECT 1
        FROM payments p
        WHERE p.order_id = o.order_id
          AND p.payment_type = 'GIFT_CARD'
    ) THEN 1 ELSE 0 END AS has_gift_card,

    CASE WHEN EXISTS (
        SELECT 1
        FROM payments p
        WHERE p.order_id = o.order_id
          AND p.payment_type = 'CRYPTO'
    ) THEN 1 ELSE 0 END AS has_crypto

FROM orders o
WHERE o.order_date >= '2024-01-01'

What Happens:

For EACH order:
  Order #1001 → Scan payments table for 'CREDIT_CARD'  [Scan 1]
  Order #1001 → Scan payments table for 'PAYPAL'       [Scan 2]
  Order #1001 → Scan payments table for 'GIFT_CARD'    [Scan 3]
  Order #1001 → Scan payments table for 'CRYPTO'       [Scan 4]

  Order #1002 → Scan payments table for 'CREDIT_CARD'  [Scan 5]
  Order #1002 → Scan payments table for 'PAYPAL'       [Scan 6]
  ... and so on

Total scans: 4 scans × number of orders


✅ FAST WAY: Single LEFT JOIN + MAX(CASE WHEN)

SELECT
    o.order_id,
    o.customer_name,
    o.order_date,
    o.total_amount,

    -- All checks happen on already-joined data
    MAX(CASE WHEN p.payment_type = 'CREDIT_CARD' THEN 1 ELSE 0 END) AS has_credit_card,
    MAX(CASE WHEN p.payment_type = 'PAYPAL' THEN 1 ELSE 0 END) AS has_paypal,
    MAX(CASE WHEN p.payment_type = 'GIFT_CARD' THEN 1 ELSE 0 END) AS has_gift_card,
    MAX(CASE WHEN p.payment_type = 'CRYPTO' THEN 1 ELSE 0 END) AS has_crypto

FROM orders o
LEFT JOIN payments p ON p.order_id = o.order_id
WHERE o.order_date >= '2024-01-01'
GROUP BY
    o.order_id,
    o.customer_name,
    o.order_date,
    o.total_amount

What Happens:

Step 1: LEFT JOIN creates combined table (ONE TIME)

orders:                  payments:
order_id=1001     JOIN   order_id=1001, payment_type='CREDIT_CARD'
                  →      order_id=1001, payment_type='PAYPAL'

order_id=1002     JOIN   order_id=1002, payment_type='GIFT_CARD'

Result table in memory:

order_id | customer_name | payment_type
---------|---------------|---------------
1001     | Alice Smith   | CREDIT_CARD
1001     | Alice Smith   | PAYPAL        ← Same order, multiple rows
1002     | Bob Jones     | GIFT_CARD
1003     | Carol Lee     | NULL          ← No payments yet

Step 2: GROUP BY order_id organizes rows

Group 1 (order_id=1001):
  ├─ row: payment_type='CREDIT_CARD'
  └─ row: payment_type='PAYPAL'

Group 2 (order_id=1002):
  └─ row: payment_type='GIFT_CARD'

Group 3 (order_id=1003):
  └─ row: payment_type=NULL

Step 3: MAX(CASE WHEN) evaluates rows within each group

For Group 1 (order_id=1001):

rows = ['CREDIT_CARD', 'PAYPAL']

MAX(CASE WHEN payment_type = 'CREDIT_CARD' THEN 1 ELSE 0 END)
# Evaluates to: max([1, 0]) = 1

MAX(CASE WHEN payment_type = 'PAYPAL' THEN 1 ELSE 0 END)
# Evaluates to: max([0, 1]) = 1

MAX(CASE WHEN payment_type = 'GIFT_CARD' THEN 1 ELSE 0 END)
# Evaluates to: max([0, 0]) = 0

MAX(CASE WHEN payment_type = 'CRYPTO' THEN 1 ELSE 0 END)
# Evaluates to: max([0, 0]) = 0

Total scans: 1 scan (the initial LEFT JOIN)


Python Analogy

EXISTS approach (inefficient):

# Query the database multiple times for each order
for order in orders:
    has_credit = db.query("SELECT 1 FROM payments WHERE order_id=? AND type='CREDIT_CARD'", order.id)
    has_paypal = db.query("SELECT 1 FROM payments WHERE order_id=? AND type='PAYPAL'", order.id)
    has_gift = db.query("SELECT 1 FROM payments WHERE order_id=? AND type='GIFT_CARD'", order.id)
    has_crypto = db.query("SELECT 1 FROM payments WHERE order_id=? AND type='CRYPTO'", order.id)

LEFT JOIN approach (efficient):

# Join once, then work with data in memory
joined_data = db.join(orders, payments)  # ONE database operation

for order_id, payment_rows in joined_data.groupby('order_id'):
    # Just iterating over data already in memory
    has_credit = max(1 if row.type == 'CREDIT_CARD' else 0 for row in payment_rows)
    has_paypal = max(1 if row.type == 'PAYPAL' else 0 for row in payment_rows)
    has_gift = max(1 if row.type == 'GIFT_CARD' else 0 for row in payment_rows)
    has_crypto = max(1 if row.type == 'CRYPTO' else 0 for row in payment_rows)

Big O Complexity

EXISTS approach: - O(n × 4 × m) where n = orders, m = payments per scan - Re-scans the payments table 4 times for every single order

LEFT JOIN approach: - O(n × m) for the join + O(n × k) for aggregation where k = payments per order - Scans the payments table ONCE total


Performance Impact

For 10,000 orders with average 2 payments each:

Method Table Scans Approximate Time
EXISTS 40,000 scans Very slow
LEFT JOIN 1 scan Fast

Expected speedup: 5-10x depending on data size and indexes


Key Takeaway

LEFT JOIN fetches all the data ONCE into a combined result set. After that, the database works with data already in memory - no more table scans needed. The MAX(CASE WHEN) is just aggregating rows that are already sitting there.