Defender — Objective 3: Use jq¶
What is jq?¶
jq is a command-line JSON processor. Think of it as grep/awk/sed but specifically for JSON. It lets you extract fields, filter records, reshape data, and aggregate — all from the terminal, no scripting needed.
Install: brew install jq (macOS) | apt install jq (Linux)
CloudTrail logs are nested JSON. Without jq you'd be scrolling through thousands of lines manually. With jq you can write a one-liner to extract exactly what you need.
Step 1 — Decompress the Logs¶
CloudTrail stores logs as .json.gz (gzip-compressed JSON). First, decompress everything:
| Part | What it does |
|---|---|
find . |
Search starting from current directory, recursively into all subdirectories |
-type f |
Only match files — skip directories |
-exec gunzip {} \; |
For each file found, run gunzip <filename>. {} is replaced by the actual filename. \; terminates the -exec expression |
This finds every .json.gz file in any subdirectory and decompresses it in place. The .json.gz files become .json files, keeping the same name minus the .gz extension.
Step 2 — Pretty-Print All Events¶
| Part | What it does |
|---|---|
find . -type f -exec cat {} \; |
Print the raw contents of every JSON file |
\| |
Shell pipe — feed all that output into jq |
jq '.' |
The identity filter — parses the JSON and pretty-prints it with indentation and colour |
jq filter syntax — essential reference:
| Filter | What it does |
|---|---|
. |
Identity — output the full input, pretty-printed |
.foo |
Extract field foo from the current object |
.foo.bar |
Nested field — bar inside foo |
.foo[] |
Iterate over array foo — each element becomes a separate output |
.Records[] |
Iterate over every record in CloudTrail's Records array |
\| |
Pipe within jq — pass output of left filter into right filter |
select(cond) |
Only pass through elements where condition is true (like SQL WHERE) |
[.a, .b] |
Build a new array with specific fields |
@tsv |
Format an array as tab-separated values |
fromjson |
Parse a JSON string (an escaped JSON value) into actual JSON |
Step 3 — Extract Just Event Names¶
Breaking down .Records[]|.eventName:
| Part | What it does |
|---|---|
.Records[] |
Iterate over the Records array — each CloudTrail event becomes a separate jq input |
\| |
Pipe — pass each event to the next filter |
.eventName |
From each event, extract just the eventName field |
Output:
Events are out of order because multiple files are processed in arbitrary order.
Step 4 — Add Timestamps and Sort¶
Full breakdown:
| Part | What it does |
|---|---|
jq -cr |
Two flags: -c compact (one result per line, no pretty-printing) + -r raw (output bare strings, not JSON-quoted strings) |
.Records[] |
Iterate over all events |
[.eventTime, .eventName] |
Build an array with just these two fields from each event |
\|@tsv |
Format that two-element array as TSV — elements separated by a tab |
\| sort |
Shell sort — since the first column is ISO 8601 timestamps, alphabetical sort = chronological sort |
Why -r is required for @tsv: Without -r, jq wraps its output in quotes (it's still treating it as a JSON string). With -r, you get raw text that sort can work with.
Why @tsv instead of JSON? Two reasons:
1. sort operates on plain text lines — TSV gives you one record per line, sortable by the first column
2. You can paste TSV output directly into Excel — it auto-splits on tabs
Step 5 — Full Context Per Event¶
find . -type f -exec cat {} \; | \
jq -cr '.Records[]|[.eventTime, .sourceIPAddress, .userIdentity.arn, .userIdentity.accountId, .userIdentity.type, .eventName]|@tsv' | sort
What each field tells you:
| Field | Why it matters |
|---|---|
.eventTime |
When the call happened — lets you reconstruct a timeline |
.sourceIPAddress |
Which IP made the call — non-AWS IPs on service roles = red flag |
.userIdentity.arn |
Exact identity — tells you what role or user was used |
.userIdentity.accountId |
Which account the identity belongs to |
.userIdentity.type |
AssumedRole, IAMUser, AWSService, Anonymous |
.eventName |
What action was performed |
Nested field access: .userIdentity.arn works because userIdentity is a JSON object. You chain dot-notation to drill into nested fields.
Understanding the Three Event Categories¶
ANONYMOUS_PRINCIPAL events:
These look alarming but are not. ANONYMOUS_PRINCIPAL = the request was made with no AWS credentials at all — an unauthenticated HTTP request. Since the flaws2.cloud website is hosted on public S3 buckets, every browser page load generates GetObject events logged as ANONYMOUS_PRINCIPAL.
ANONYMOUS_PRINCIPAL = someone's browser loading the website. Filter them out — they're web traffic, not API calls.
As an analyst: these are background noise. The attacker was also browsing the site (note the same IP 104.102.221.250 appearing in both ANONYMOUS_PRINCIPAL events and later AssumedRole events — that's the attacker doing recon before exploitation).
AWSService events:
AWS infrastructure making internal API calls. When a Lambda function cold-starts, AWS internally calls sts:AssumeRole to issue it temporary credentials — that appears as AWSService. When API Gateway invokes Lambda, it appears as AWSService. These are internal plumbing events.
Filter these out too. They tell you about normal infrastructure activity, not what any person or attacker did.
AssumedRole events from non-AWS IPs — this is what you hunt:
An ECS task role showing up with sourceIPAddress: 104.102.221.250 (not an AWS IP) means the credentials were stolen from the container and used externally. The role is valid and working — the anomaly is where it's being used from.
The Full Attack Timeline¶
| Time (UTC) | IP | Identity | Event | What it means |
|---|---|---|---|---|
| 22:31:59 | AWS internal | AWSService | AssumeRole | ECS task starting — legitimate |
| 23:02:56–23:03:18 | 104.102.221.250 | ANONYMOUS_PRINCIPAL | GetObject ×13 | Attacker browsing the site |
| 23:03:12 | 34.234.236.212 | level1/level1 | CreateLogStream | Lambda running (AWS internal IP — normal) |
| 23:03:13 | apigateway | AWSService | Invoke | API Gateway invoking Lambda |
| 23:04:54 | 104.102.221.250 | level1/level1 | ListObjects | Attacker using stolen Lambda credentials |
| 23:05:53 | 104.102.221.250 | level1/level1 | ListImages | Attacker enumerating ECR |
| 23:06:17 | 104.102.221.250 | level1/level1 | BatchGetImage | Attacker pulling the container image |
| 23:06:33 | 104.102.221.250 | level1/level1 | GetDownloadUrlForLayer | Attacker downloading image layers |
| 23:09:28 | 104.102.221.250 | level3/d190d14a... | ListBuckets | Attacker using stolen ECS credentials |
Reading the story: same IP 104.102.221.250 doing both browser requests (ANONYMOUS) and authenticated CLI calls (AssumedRole) minutes later = one attacker doing recon then exploitation. The level1 role running from that IP is impossible unless credentials were stolen — Lambda runs on 34.234.236.212, an AWS IP.