Source: https://cli.nylas.com/guides/extract-email-data-jq

# Extract Email Data from JSON with jq

You have a JSON payload full of email data — an API response, a webhook body, an exported mailbox — and you need the addresses out of it. This is a jq job, and the same 5 recipes cover almost every variant: structured field extraction, regex scanning for addresses buried in text, domain grouping, CSV export, and the empty-input guard that keeps the pipeline from blowing up in cron.

Written by [Qasim Muhammad](https://cli.nylas.com/authors/qasim-muhammad) Staff SRE

Updated June 6, 2026

> **TL;DR:** Structured senders: `jq -r '.[].from[0].email'`. Addresses buried anywhere in a payload: `jq '[.. | strings | scan("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}")] | unique'`. Feed either from [`nylas email list --json`](https://cli.nylas.com/docs/commands/email-list), and always guard the variable with `[-z "$DATA"]` before jq touches it.

Command references used in this guide: [`nylas email list`](https://cli.nylas.com/docs/commands/email-list), [`nylas email search`](https://cli.nylas.com/docs/commands/email-search), and [`nylas email read`](https://cli.nylas.com/docs/commands/email-read).

## What does email JSON look like?

Email JSON from the Nylas CLI is an array of message objects, each with 15 fields including `id`, `subject`, `date`, `snippet`, and the address arrays `from`, `to`, and `reply_to`. Addresses are objects, not strings (`{"name": "Alice", "email": "alice@example.com"}`), and that single fact explains most failed first attempts at extraction.

Because `from` is an array of those objects, the sender's address lives at `.from[0].email`, not `.from`. Every recipe below builds on that shape. The fastest way to orient yourself in any unfamiliar payload is to ask jq for the key list first:

```bash
# What fields does each message carry?
nylas email list --limit 5 --json | jq '[.[] | keys] | add | unique'

# What does one address object look like?
nylas email list --limit 1 --json | jq '.[0].from'
```

## How do you extract sender addresses?

Pulling every sender is one filter: `.[].from[0].email` with `-r` for raw strings instead of quoted JSON. Add `sort | uniq -c` from coreutils and you have a frequency table — 50 messages reduce to a ranked sender list in well under a second, all locally.

```bash
# All senders, one per line
nylas email list --limit 50 --json | jq -r '.[].from[0].email'

# Ranked: your top 10 senders
nylas email list --limit 50 --json | \
  jq -r '.[].from[0].email' | sort | uniq -c | sort -rn | head -10

# Sender + subject pairs for triage
nylas email list --unread --json | \
  jq -r '.[] | "\(.from[0].email)\t\(.subject)"'
```

The `// empty` fallback operator belongs in any pipeline that feeds another command: `.from[0].email // empty` emits nothing (instead of the string `null`) when a message has no parsed sender, so downstream `xargs` or `grep` never receives a literal "null" line.

## How do you extract addresses buried in any JSON payload?

When addresses hide in free text (a forwarded body, a webhook field, a log line wrapped in JSON), structured paths can't reach them, and jq's `scan()` regex function takes over. The recursive descent operator `..` visits every value in the document, `strings` keeps only text nodes, and `scan()` returns each regex match. Per the [jq manual](https://jqlang.org/manual/), `scan` will "emit a stream of the non-overlapping substrings of the input that match the regex", exactly the shape you want to collect with `[…]` and pipe into `unique`.

```bash
# Every email address anywhere in a payload, deduplicated
jq '
  [.. | strings | scan("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}")]
  | unique' payload.json

# Same trick against a message body
nylas email read MSG_ID --json | jq '
  [.body | scan("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}")]
  | unique'
```

This is the generic answer to "extract emails from a JSON payload" — it works on documents you've never seen the schema of, at the cost of false positives on address-shaped strings. The pattern approximates the address grammar from [RFC 5322](https://datatracker.ietf.org/doc/html/rfc5322); a fully spec-correct regex runs to thousands of characters and isn't worth it for extraction work. For payloads you control, prefer the structured paths: they're precise, and they skip the cost of walking every value in the document.

## How do you group senders by domain?

Domain grouping turns an address list into an organizational view — which companies fill your inbox. Splitting on `@` with `split("@")[1]` and feeding `group_by` produces counts per domain in one expression. On a 200-message window this typically collapses to 20-40 domains, a much more readable report than raw addresses.

```bash
nylas email list --limit 200 --json | jq '
  [.[] | .from[0].email // empty | split("@")[1]]
  | group_by(.)
  | map({domain: .[0], messages: length})
  | sort_by(-.messages)'
```

The same expression powers the company-mapping workflow in the [organize-by-company guide](https://cli.nylas.com/guides/organize-emails-by-company), which takes the next step of moving messages into per-company folders.

## How do you export email data to CSV?

jq's `@csv` formatter writes spreadsheet-ready output directly — no Python, no intermediate files. Each row must be an array, so the filter maps every message to an array of the 3-4 columns you want. Excel and Google Sheets both open the result without an import wizard.

```bash
# date, sender, subject — straight to a file
nylas email list --limit 200 --json | jq -r '
  .[] | [(.date | todate), .from[0].email, .subject] | @csv
' > inbox-export.csv

# Unread only, with thread IDs for follow-up tooling
nylas email list --unread --json | jq -r '
  .[] | [.id, .thread_id, .from[0].email, .subject] | @csv
' > unread.csv
```

The `todate` filter converts the epoch-seconds `date` field into the ISO 8601 format profiled in [RFC 3339](https://datatracker.ietf.org/doc/html/rfc3339), which sorts correctly in every spreadsheet tool. Skip it and your date column sorts as raw integers — functional, but unreadable in review.

## How do you guard jq against empty input?

Every email-to-jq pipeline shares a failure mode: if the upstream command fails (auth expired, network down, empty result), jq receives zero bytes and exits with a parse error that says nothing about the real cause. In a cron job that error lands in a log nobody reads, after the script already misbehaved. The fix is 1 line: capture output to a variable and test it before jq runs.

```bash
#!/usr/bin/env bash
set -euo pipefail

EMAILS=$(nylas email list --unread --json)
[ -z "$EMAILS" ] && { echo "no data returned — check auth"; exit 0; }

COUNT=$(echo "$EMAILS" | jq length)
echo "Unread: $COUNT"
```

Two details carry the weight here. Don't suppress stderr on the CLI call — the real error (expired grant, rate limit) should reach the log. And exit `0` on the empty case in cron contexts where "nothing to do" is normal; reserve non-zero exits for genuine failures so your scheduler's alerting stays meaningful. The [reliable automation guide](https://cli.nylas.com/guides/build-reliable-email-automation) builds the full pattern set on top of this guard.

## Next steps

- [Back up email to JSON](https://cli.nylas.com/guides/backup-emails-to-json) — produce the export files these recipes consume
- [Organize emails by company](https://cli.nylas.com/guides/organize-emails-by-company) — turn the domain grouping into folder automation
- [Parse inbound email webhooks](https://cli.nylas.com/guides/parse-inbound-email-webhooks) — the same extraction patterns against webhook payloads
- [Email to Slack notifications](https://cli.nylas.com/guides/email-to-slack-notifications) — pipe extracted fields into channel alerts
- [Full command reference](https://cli.nylas.com/docs/commands) — every list, search, and read flag documented
