Skip to content

Defender — Objective 3: Use jq

What is jq?

jq is a command-line JSON processor. Think of it as grep/awk/sed but specifically for JSON. It lets you extract fields, filter records, reshape data, and aggregate — all from the terminal, no scripting needed.

Install: brew install jq (macOS) | apt install jq (Linux)

CloudTrail logs are nested JSON. Without jq you'd be scrolling through thousands of lines manually. With jq you can write a one-liner to extract exactly what you need.


Step 1 — Decompress the Logs

CloudTrail stores logs as .json.gz (gzip-compressed JSON). First, decompress everything:

find . -type f -exec gunzip {} \;
Part What it does
find . Search starting from current directory, recursively into all subdirectories
-type f Only match files — skip directories
-exec gunzip {} \; For each file found, run gunzip <filename>. {} is replaced by the actual filename. \; terminates the -exec expression

This finds every .json.gz file in any subdirectory and decompresses it in place. The .json.gz files become .json files, keeping the same name minus the .gz extension.


Step 2 — Pretty-Print All Events

find . -type f -exec cat {} \; | jq '.'
Part What it does
find . -type f -exec cat {} \; Print the raw contents of every JSON file
\| Shell pipe — feed all that output into jq
jq '.' The identity filter — parses the JSON and pretty-prints it with indentation and colour

jq filter syntax — essential reference:

Filter What it does
. Identity — output the full input, pretty-printed
.foo Extract field foo from the current object
.foo.bar Nested field — bar inside foo
.foo[] Iterate over array foo — each element becomes a separate output
.Records[] Iterate over every record in CloudTrail's Records array
\| Pipe within jq — pass output of left filter into right filter
select(cond) Only pass through elements where condition is true (like SQL WHERE)
[.a, .b] Build a new array with specific fields
@tsv Format an array as tab-separated values
fromjson Parse a JSON string (an escaped JSON value) into actual JSON

Step 3 — Extract Just Event Names

find . -type f -exec cat {} \; | jq '.Records[]|.eventName'

Breaking down .Records[]|.eventName:

Part What it does
.Records[] Iterate over the Records array — each CloudTrail event becomes a separate jq input
\| Pipe — pass each event to the next filter
.eventName From each event, extract just the eventName field

Output:

"GetObject"
"ListBuckets"
"AssumeRole"

Events are out of order because multiple files are processed in arbitrary order.


Step 4 — Add Timestamps and Sort

find . -type f -exec cat {} \; | jq -cr '.Records[]|[.eventTime, .eventName]|@tsv' | sort

Full breakdown:

Part What it does
jq -cr Two flags: -c compact (one result per line, no pretty-printing) + -r raw (output bare strings, not JSON-quoted strings)
.Records[] Iterate over all events
[.eventTime, .eventName] Build an array with just these two fields from each event
\|@tsv Format that two-element array as TSV — elements separated by a tab
\| sort Shell sort — since the first column is ISO 8601 timestamps, alphabetical sort = chronological sort

Why -r is required for @tsv: Without -r, jq wraps its output in quotes (it's still treating it as a JSON string). With -r, you get raw text that sort can work with.

Why @tsv instead of JSON? Two reasons: 1. sort operates on plain text lines — TSV gives you one record per line, sortable by the first column 2. You can paste TSV output directly into Excel — it auto-splits on tabs


Step 5 — Full Context Per Event

find . -type f -exec cat {} \; | \
jq -cr '.Records[]|[.eventTime, .sourceIPAddress, .userIdentity.arn, .userIdentity.accountId, .userIdentity.type, .eventName]|@tsv' | sort

What each field tells you:

Field Why it matters
.eventTime When the call happened — lets you reconstruct a timeline
.sourceIPAddress Which IP made the call — non-AWS IPs on service roles = red flag
.userIdentity.arn Exact identity — tells you what role or user was used
.userIdentity.accountId Which account the identity belongs to
.userIdentity.type AssumedRole, IAMUser, AWSService, Anonymous
.eventName What action was performed

Nested field access: .userIdentity.arn works because userIdentity is a JSON object. You chain dot-notation to drill into nested fields.


Understanding the Three Event Categories

ANONYMOUS_PRINCIPAL events:

These look alarming but are not. ANONYMOUS_PRINCIPAL = the request was made with no AWS credentials at all — an unauthenticated HTTP request. Since the flaws2.cloud website is hosted on public S3 buckets, every browser page load generates GetObject events logged as ANONYMOUS_PRINCIPAL.

ANONYMOUS_PRINCIPAL = someone's browser loading the website. Filter them out — they're web traffic, not API calls.

As an analyst: these are background noise. The attacker was also browsing the site (note the same IP 104.102.221.250 appearing in both ANONYMOUS_PRINCIPAL events and later AssumedRole events — that's the attacker doing recon before exploitation).

AWSService events:

AWS infrastructure making internal API calls. When a Lambda function cold-starts, AWS internally calls sts:AssumeRole to issue it temporary credentials — that appears as AWSService. When API Gateway invokes Lambda, it appears as AWSService. These are internal plumbing events.

Filter these out too. They tell you about normal infrastructure activity, not what any person or attacker did.

AssumedRole events from non-AWS IPs — this is what you hunt:

An ECS task role showing up with sourceIPAddress: 104.102.221.250 (not an AWS IP) means the credentials were stolen from the container and used externally. The role is valid and working — the anomaly is where it's being used from.


The Full Attack Timeline

Time (UTC) IP Identity Event What it means
22:31:59 AWS internal AWSService AssumeRole ECS task starting — legitimate
23:02:56–23:03:18 104.102.221.250 ANONYMOUS_PRINCIPAL GetObject ×13 Attacker browsing the site
23:03:12 34.234.236.212 level1/level1 CreateLogStream Lambda running (AWS internal IP — normal)
23:03:13 apigateway AWSService Invoke API Gateway invoking Lambda
23:04:54 104.102.221.250 level1/level1 ListObjects Attacker using stolen Lambda credentials
23:05:53 104.102.221.250 level1/level1 ListImages Attacker enumerating ECR
23:06:17 104.102.221.250 level1/level1 BatchGetImage Attacker pulling the container image
23:06:33 104.102.221.250 level1/level1 GetDownloadUrlForLayer Attacker downloading image layers
23:09:28 104.102.221.250 level3/d190d14a... ListBuckets Attacker using stolen ECS credentials

Reading the story: same IP 104.102.221.250 doing both browser requests (ANONYMOUS) and authenticated CLI calls (AssumedRole) minutes later = one attacker doing recon then exploitation. The level1 role running from that IP is impossible unless credentials were stolen — Lambda runs on 34.234.236.212, an AWS IP.