Defender — Objective 3: Use jq¶

What is jq?¶

jq is a command-line JSON processor. Think of it as grep/awk/sed but specifically for JSON. It lets you extract fields, filter records, reshape data, and aggregate — all from the terminal, no scripting needed.

Install: brew install jq (macOS) | apt install jq (Linux)

CloudTrail logs are nested JSON. Without jq you'd be scrolling through thousands of lines manually. With jq you can write a one-liner to extract exactly what you need.

Step 1 — Decompress the Logs¶

CloudTrail stores logs as .json.gz (gzip-compressed JSON). First, decompress everything:

find . -type f -exec gunzip {} \;

Part	What it does
`find .`	Search starting from current directory, recursively into all subdirectories
`-type f`	Only match files — skip directories
`-exec gunzip {} \;`	For each file found, run `gunzip <filename>`. `{}` is replaced by the actual filename. `\;` terminates the `-exec` expression

This finds every .json.gz file in any subdirectory and decompresses it in place. The .json.gz files become .json files, keeping the same name minus the .gz extension.

Step 2 — Pretty-Print All Events¶

find . -type f -exec cat {} \; | jq '.'

Part	What it does
`find . -type f -exec cat {} \;`	Print the raw contents of every JSON file
`\\|`	Shell pipe — feed all that output into jq
`jq '.'`	The identity filter — parses the JSON and pretty-prints it with indentation and colour

jq filter syntax — essential reference:

Filter	What it does
`.`	Identity — output the full input, pretty-printed
`.foo`	Extract field `foo` from the current object
`.foo.bar`	Nested field — `bar` inside `foo`
`.foo[]`	Iterate over array `foo` — each element becomes a separate output
`.Records[]`	Iterate over every record in CloudTrail's Records array
`\\|`	Pipe within jq — pass output of left filter into right filter
`select(cond)`	Only pass through elements where condition is true (like SQL `WHERE`)
`[.a, .b]`	Build a new array with specific fields
`@tsv`	Format an array as tab-separated values
`fromjson`	Parse a JSON string (an escaped JSON value) into actual JSON

Step 3 — Extract Just Event Names¶

find . -type f -exec cat {} \; | jq '.Records[]|.eventName'

Breaking down .Records[]|.eventName:

Part	What it does
`.Records[]`	Iterate over the Records array — each CloudTrail event becomes a separate jq input
`\\|`	Pipe — pass each event to the next filter
`.eventName`	From each event, extract just the eventName field

Output:

"GetObject"
"ListBuckets"
"AssumeRole"

Events are out of order because multiple files are processed in arbitrary order.

Step 4 — Add Timestamps and Sort¶

find . -type f -exec cat {} \; | jq -cr '.Records[]|[.eventTime, .eventName]|@tsv' | sort

Full breakdown:

Part	What it does
`jq -cr`	Two flags: `-c` compact (one result per line, no pretty-printing) + `-r` raw (output bare strings, not JSON-quoted strings)
`.Records[]`	Iterate over all events
`[.eventTime, .eventName]`	Build an array with just these two fields from each event
`\\|@tsv`	Format that two-element array as TSV — elements separated by a tab
`\\| sort`	Shell sort — since the first column is ISO 8601 timestamps, alphabetical sort = chronological sort

Why -r is required for @tsv: Without -r, jq wraps its output in quotes (it's still treating it as a JSON string). With -r, you get raw text that sort can work with.

Why @tsv instead of JSON? Two reasons: 1. sort operates on plain text lines — TSV gives you one record per line, sortable by the first column 2. You can paste TSV output directly into Excel — it auto-splits on tabs

Step 5 — Full Context Per Event¶

find . -type f -exec cat {} \; | \
jq -cr '.Records[]|[.eventTime, .sourceIPAddress, .userIdentity.arn, .userIdentity.accountId, .userIdentity.type, .eventName]|@tsv' | sort

What each field tells you:

Field	Why it matters
`.eventTime`	When the call happened — lets you reconstruct a timeline
`.sourceIPAddress`	Which IP made the call — non-AWS IPs on service roles = red flag
`.userIdentity.arn`	Exact identity — tells you what role or user was used
`.userIdentity.accountId`	Which account the identity belongs to
`.userIdentity.type`	`AssumedRole`, `IAMUser`, `AWSService`, `Anonymous`
`.eventName`	What action was performed

Nested field access: .userIdentity.arn works because userIdentity is a JSON object. You chain dot-notation to drill into nested fields.

Understanding the Three Event Categories¶

ANONYMOUS_PRINCIPAL events:

These look alarming but are not. ANONYMOUS_PRINCIPAL = the request was made with no AWS credentials at all — an unauthenticated HTTP request. Since the flaws2.cloud website is hosted on public S3 buckets, every browser page load generates GetObject events logged as ANONYMOUS_PRINCIPAL.

ANONYMOUS_PRINCIPAL = someone's browser loading the website. Filter them out — they're web traffic, not API calls.

As an analyst: these are background noise. The attacker was also browsing the site (note the same IP 104.102.221.250 appearing in both ANONYMOUS_PRINCIPAL events and later AssumedRole events — that's the attacker doing recon before exploitation).

AWSService events:

AWS infrastructure making internal API calls. When a Lambda function cold-starts, AWS internally calls sts:AssumeRole to issue it temporary credentials — that appears as AWSService. When API Gateway invokes Lambda, it appears as AWSService. These are internal plumbing events.

Filter these out too. They tell you about normal infrastructure activity, not what any person or attacker did.

AssumedRole events from non-AWS IPs — this is what you hunt:

An ECS task role showing up with sourceIPAddress: 104.102.221.250 (not an AWS IP) means the credentials were stolen from the container and used externally. The role is valid and working — the anomaly is where it's being used from.

The Full Attack Timeline¶

Time (UTC)	IP	Identity	Event	What it means
22:31:59	AWS internal	AWSService	AssumeRole	ECS task starting — legitimate
23:02:56–23:03:18	104.102.221.250	ANONYMOUS_PRINCIPAL	GetObject ×13	Attacker browsing the site
23:03:12	34.234.236.212	level1/level1	CreateLogStream	Lambda running (AWS internal IP — normal)
23:03:13	apigateway	AWSService	Invoke	API Gateway invoking Lambda
23:04:54	104.102.221.250	level1/level1	ListObjects	Attacker using stolen Lambda credentials
23:05:53	104.102.221.250	level1/level1	ListImages	Attacker enumerating ECR
23:06:17	104.102.221.250	level1/level1	BatchGetImage	Attacker pulling the container image
23:06:33	104.102.221.250	level1/level1	GetDownloadUrlForLayer	Attacker downloading image layers
23:09:28	104.102.221.250	level3/d190d14a...	ListBuckets	Attacker using stolen ECS credentials

Reading the story: same IP 104.102.221.250 doing both browser requests (ANONYMOUS) and authenticated CLI calls (AssumedRole) minutes later = one attacker doing recon then exploitation. The level1 role running from that IP is impossible unless credentials were stolen — Lambda runs on 34.234.236.212, an AWS IP.