Guide

Extract Email Data from JSON with jq

You have a JSON payload full of email data — an API response, a webhook body, an exported mailbox — and you need the addresses out of it. This is a jq job, and the same 5 recipes cover almost every variant: structured field extraction, regex scanning for addresses buried in text, domain grouping, CSV export, and the empty-input guard that keeps the pipeline from blowing up in cron.

Written by Qasim Muhammad Staff SRE

Updated June 6, 2026

Verified — CLI 3.1.16 · Gmail · last tested June 6, 2026

Command references used in this guide: nylas email list, nylas email search, and nylas email read.

What does email JSON look like?

Email JSON from the Nylas CLI is an array of message objects, each with 15 fields including id, subject, date, snippet, and the address arrays from, to, and reply_to. Addresses are objects, not strings ({"name": "Alice", "email": "alice@example.com"}), and that single fact explains most failed first attempts at extraction.

Because from is an array of those objects, the sender's address lives at .from[0].email, not .from. Every recipe below builds on that shape. The fastest way to orient yourself in any unfamiliar payload is to ask jq for the key list first:

# What fields does each message carry?
nylas email list --limit 5 --json | jq '[.[] | keys] | add | unique'

# What does one address object look like?
nylas email list --limit 1 --json | jq '.[0].from'

How do you extract sender addresses?

Pulling every sender is one filter: .[].from[0].email with -r for raw strings instead of quoted JSON. Add sort | uniq -c from coreutils and you have a frequency table — 50 messages reduce to a ranked sender list in well under a second, all locally.

# All senders, one per line
nylas email list --limit 50 --json | jq -r '.[].from[0].email'

# Ranked: your top 10 senders
nylas email list --limit 50 --json | \
  jq -r '.[].from[0].email' | sort | uniq -c | sort -rn | head -10

# Sender + subject pairs for triage
nylas email list --unread --json | \
  jq -r '.[] | "\(.from[0].email)\t\(.subject)"'

The // empty fallback operator belongs in any pipeline that feeds another command: .from[0].email // empty emits nothing (instead of the string null) when a message has no parsed sender, so downstream xargs or grep never receives a literal "null" line.

How do you extract addresses buried in any JSON payload?

When addresses hide in free text (a forwarded body, a webhook field, a log line wrapped in JSON), structured paths can't reach them, and jq's scan() regex function takes over. The recursive descent operator .. visits every value in the document, strings keeps only text nodes, and scan() returns each regex match. Per the jq manual, scan will "emit a stream of the non-overlapping substrings of the input that match the regex", exactly the shape you want to collect with […] and pipe into unique.

# Every email address anywhere in a payload, deduplicated
jq '
  [.. | strings | scan("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}")]
  | unique' payload.json

# Same trick against a message body
nylas email read MSG_ID --json | jq '
  [.body | scan("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}")]
  | unique'

This is the generic answer to "extract emails from a JSON payload" — it works on documents you've never seen the schema of, at the cost of false positives on address-shaped strings. The pattern approximates the address grammar from RFC 5322; a fully spec-correct regex runs to thousands of characters and isn't worth it for extraction work. For payloads you control, prefer the structured paths: they're precise, and they skip the cost of walking every value in the document.

How do you group senders by domain?

Domain grouping turns an address list into an organizational view — which companies fill your inbox. Splitting on @ with split("@")[1] and feeding group_by produces counts per domain in one expression. On a 200-message window this typically collapses to 20-40 domains, a much more readable report than raw addresses.

nylas email list --limit 200 --json | jq '
  [.[] | .from[0].email // empty | split("@")[1]]
  | group_by(.)
  | map({domain: .[0], messages: length})
  | sort_by(-.messages)'

The same expression powers the company-mapping workflow in the organize-by-company guide, which takes the next step of moving messages into per-company folders.

How do you export email data to CSV?

jq's @csv formatter writes spreadsheet-ready output directly — no Python, no intermediate files. Each row must be an array, so the filter maps every message to an array of the 3-4 columns you want. Excel and Google Sheets both open the result without an import wizard.

# date, sender, subject — straight to a file
nylas email list --limit 200 --json | jq -r '
  .[] | [(.date | todate), .from[0].email, .subject] | @csv
' > inbox-export.csv

# Unread only, with thread IDs for follow-up tooling
nylas email list --unread --json | jq -r '
  .[] | [.id, .thread_id, .from[0].email, .subject] | @csv
' > unread.csv

The todate filter converts the epoch-seconds date field into the ISO 8601 format profiled in RFC 3339, which sorts correctly in every spreadsheet tool. Skip it and your date column sorts as raw integers — functional, but unreadable in review.

How do you guard jq against empty input?

Every email-to-jq pipeline shares a failure mode: if the upstream command fails (auth expired, network down, empty result), jq receives zero bytes and exits with a parse error that says nothing about the real cause. In a cron job that error lands in a log nobody reads, after the script already misbehaved. The fix is 1 line: capture output to a variable and test it before jq runs.

#!/usr/bin/env bash
set -euo pipefail

EMAILS=$(nylas email list --unread --json)
[ -z "$EMAILS" ] && { echo "no data returned — check auth"; exit 0; }

COUNT=$(echo "$EMAILS" | jq length)
echo "Unread: $COUNT"

Two details carry the weight here. Don't suppress stderr on the CLI call — the real error (expired grant, rate limit) should reach the log. And exit 0 on the empty case in cron contexts where "nothing to do" is normal; reserve non-zero exits for genuine failures so your scheduler's alerting stays meaningful. The reliable automation guide builds the full pattern set on top of this guard.

Next steps

Back up email to JSON — produce the export files these recipes consume
Organize emails by company — turn the domain grouping into folder automation
Parse inbound email webhooks — the same extraction patterns against webhook payloads
Email to Slack notifications — pipe extracted fields into channel alerts
Full command reference — every list, search, and read flag documented