AWS Storage Explained¶

What is EC2?¶

AWS has a warehouse full of powerful physical computers. Each physical computer is so powerful that software can split it into 10-50 fake computers simultaneously. Each fake computer behaves exactly like a real computer — it has its own CPU, RAM, disk, network, runs its own Linux. You rent one of these fake computers. That fake computer is called EC2.

AWS buys thousands of massive physical servers. Each server is extremely powerful — 256 CPUs, 2TB RAM. Way too much for one person. So AWS runs a hypervisor on each physical server, which splits it into 50 smaller fake computers. Each fake computer gets allocated some CPUs, some RAM, some disk. Each one runs its own Linux completely independently. They can’t see each other. Each one thinks it’s a real physical computer.

You rent one of those fake computers. That’s EC2. You get your own Linux, your own CPUs, your own RAM. But underneath, you’re sharing a physical box with 49 other people’s EC2s.

Now — when you STOP your EC2, that fake computer ceases to exist. It’s just a process that got killed. AWS frees up those CPUs and RAM for someone else immediately. The physical machine carries on running other people’s EC2s.

When you START your EC2 again, AWS picks any physical server that has spare capacity and creates a new fake computer on it. Could be a completely different physical machine in a different part of the datacenter. Fresh fake computer, fresh internal disk, nothing on it.

The Storage Options¶

Instance Store¶

Remember that physical server running 50 EC2s? That physical server has a real hard drive inside it. AWS carves out a slice of that physical drive and gives it to your EC2. Because it’s on the SAME physical machine, it’s incredibly fast — no network involved, direct connection. But when you stop your EC2, that fake computer dies, and that slice of disk gets wiped and given to the next EC2 that spins up on that physical machine. It’s gone forever. You can’t get it back.

EBS (Elastic Block Store)¶

AWS has OTHER physical machines in the datacenter whose only job is storing disks. Not running EC2s — just storing data. These are full computers running Linux, with custom AWS software on top to manage connections, handle where data sits on physical drives, and handle replication. You never interact with that OS directly.

Your EC2 (fake computer on machine A) connects to that storage machine (machine B) over a network cable. Your OS sees it as a local disk, but actually every read and write travels over the network to machine B. Because machine B is dedicated only to you for that volume, when your EC2 stops and restarts on a completely different physical machine, it just reconnects to machine B over the network. Your data is always there.

EFS (Elastic File System)¶

Same concept as EBS — separate physical machines dedicated to storage, running Linux with custom software. The difference is the software running on those storage machines. EBS storage software says “one connection only.” EFS storage software says “thousands of connections simultaneously welcome.” So 500 EC2s can all read and write to the same EFS volume at the same time. The tradeoff is that coordinating thousands of simultaneous connections adds overhead, making it slightly slower than EBS.

S3 (Simple Storage Service)¶

Completely different concept. You’re not “connecting” to a disk at all. You’re talking to a web service over HTTP — same protocol as when you open a website in a browser. You say “store this file” and AWS internally decides how to store it — splits it across hundreds of drives, replicates it across multiple datacenters for redundancy, indexes it. You never see any of that. You just get a URL back. Because of this, you can NEVER say “change byte 500 in this file” — HTTP doesn’t work that way. You can only GET the whole file or PUT a whole new file. But it’s massively cheap and infinitely scalable because AWS spreads the load across thousands of machines internally.

When to Use Each¶

Instance Store¶

When you need the absolute fastest disk possible and you don’t care about losing the data. Example — you’re processing a massive dataset, you need to sort 5TB of data temporarily, you write results somewhere else (S3), then throw it away. Or caching — storing temporary cache data that can be rebuilt if lost.

EBS¶

Almost everything that needs persistent storage on a single EC2. Your OS lives here by default. Your database lives here — MySQL, PostgreSQL, MongoDB. Anything where one server needs fast reliable disk that survives restarts. This is the default choice 90% of the time.

EFS¶

When multiple EC2s need to read and write the SAME files simultaneously. Example — you have 20 web servers all serving user uploaded images. User uploads a photo to server 1. Server 2 needs to serve that same photo. With EBS you can’t — it’s attached to server 1 only. With EFS all 20 servers mount the same volume and all see the same files instantly.

S3¶

When you’re not running a database or filesystem — just storing files long term. Backups, images, videos, static website files, logs, anything you access via URL. Also when you need to store billions of files — EBS and EFS would be astronomically expensive at that scale. S3 is dirt cheap and infinitely scalable.