Guide
Load Email into BigQuery (CLI)
Analysts want email in BigQuery so they can join it against revenue, support, and product tables. The usual path is a managed connector with a monthly seat fee. This guide builds the pipeline yourself: export messages as JSON with the CLI, shape each one into a BigQuery row with jq, then load in batch with bq load or stream them through the tabledata.insertAll API.
Written by Pouya Sanooei Software Engineer
Reviewed by Qasim Muhammad
Command references used in this guide: nylas email search, nylas email list, and nylas email read.
How do I export email as JSON for BigQuery?
Export email by running nylas email search "*" --json, which returns an array of message objects with subject, sender, date, and snippet fields. The CLI authenticates through a stored grant and works the same across Gmail, Outlook, and four other providers, so you never write a Graph or Gmail API call. A single search auto-paginates past 200 results when you raise the limit.
The email search command takes a query string plus filters such as --after and --from. Scope each export to a date window so a backfill stays bounded and a daily run only touches the last 24 hours of mail. Writing the raw JSON to disk first keeps the export and the transform as separate, re-runnable steps.
# Export the last 24 hours of mail as a JSON array
nylas email search "*" --after 2026-06-08 --json --limit 500 > raw.json
# Confirm the shape: count the messages exported
jq 'length' raw.jsonHow do I shape email JSON into BigQuery rows with jq?
Shape the export into BigQuery rows by piping it through jq -c to emit newline-delimited JSON, one object per line. BigQuery's NEWLINE_DELIMITED_JSON format expects exactly that, and each key must match a column name in your table schema. Flatten nested arrays such as the sender list down to a single email string.
The message date arrives as a Unix epoch in seconds; multiply by 1000 only if your column is a millisecond timestamp, otherwise load it as an INT64 and convert in SQL. Use // fallbacks so a missing subject becomes an empty string rather than a null that breaks a NOT NULL column. The transform syntax follows the official jq manual.
# Flatten each message into one row per line (NDJSON)
jq -c '.[] | {
id: .id,
thread: .thread_id,
subject: (.subject // ""),
sender: (.from[0].email // ""),
ts: .date,
snippet: (.snippet // "")
}' raw.json > rows.ndjsonHow do I load email rows with bq load?
Load the NDJSON file in one batch with bq load, the BigQuery CLI command for ingesting local files. Point it at dataset.table, set the source format to NDJSON, and either pass an explicit schema or autodetect it. Batch loads are free of insert charges — you pay only for storage and the queries you run later.
A local file load caps at 10 MB through the CLI before you need a Cloud Storage staging path, which covers thousands of email rows per run. The bq tool ships with the Google Cloud SDK; the flags below follow the BigQuery local-load docs. Re-running the load appends by default, so de-duplicate on the message id in your queries.
# Batch-load the NDJSON file into a BigQuery table
bq load \
--source_format=NEWLINE_DELIMITED_JSON \
--autodetect \
mailbox.messages \
rows.ndjsonHow do I stream email into BigQuery in real time?
Stream email by POSTing rows to the tabledata.insertAll API instead of batch-loading a file. Each request carries up to 10 MB and 500 rows, and the data is queryable within seconds rather than after a load job completes. Pair it with a webhook so a new message lands in BigQuery the moment it arrives.
Streaming inserts cost per 200 MB ingested, unlike free batch loads, so reserve this path for low-latency needs. Send an insertId per row and BigQuery de-duplicates best-effort over a few minutes; use the message id for that field. The request body is documented in the tabledata.insertAll reference; quota limits are in the BigQuery quotas page.
# Build an insertAll body from one exported message and stream it
PROJECT="your-project"; DATASET="mailbox"; TABLE="messages"
jq -c '{rows: [.[] | {insertId: .id, json: {
id: .id, subject: (.subject // ""), sender: (.from[0].email // ""), ts: .date
}}]}' raw.json > insert.json
curl -s -X POST \
"https://bigquery.googleapis.com/bigquery/v2/projects/$PROJECT/datasets/$DATASET/tables/$TABLE/insertAll" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d @insert.jsonWhy schedule the export instead of loading once?
Schedule the export because email is append-only data that grows every day, and a one-time load goes stale within hours. A cron job that runs the search, the jq transform, and the load keeps a BigQuery table current without re-importing the whole mailbox. Scoping each run to --after yesterday keeps the daily payload under a few hundred rows.
Run a daily cron for analytics that tolerate a 24-hour lag, and switch the same transform to the streaming path when a dashboard needs sub-minute freshness. The export and load steps are independent, so a failed load can re-run against the saved raw.json without re-querying the mailbox. Store the BigQuery credentials separately from the mailbox grant so you rotate each on its own cadence.
# Daily export-shape-load, suitable for crontab at 06:00
nylas email search "*" --after "$(date -v-1d +%F)" --json --limit 1000 \
| jq -c '.[] | {id, subject: (.subject // ""), sender: (.from[0].email // ""), ts: .date}' \
> rows.ndjson
bq load --source_format=NEWLINE_DELIMITED_JSON --autodetect mailbox.messages rows.ndjsonNext steps
- Load email into Postgres — the same export-shape-load pattern into a relational database
- Export email to CSV — a flat file instead of a warehouse
- Sync email to S3 — archive the raw JSON before loading
- Email to Zendesk tickets — route messages instead of warehousing them
- Email to Jira issues — turn inbound mail into tracked work
- Full command reference — every flag and subcommand documented
- BigQuery bq CLI reference — every bq load flag