Source: https://cli.nylas.com/guides/email-to-bigquery

# Load Email into BigQuery (CLI)

Analysts want email in BigQuery so they can join it against revenue, support, and product tables. The usual path is a managed connector with a monthly seat fee. This guide builds the pipeline yourself: export messages as JSON with the CLI, shape each one into a BigQuery row with jq, then load in batch with bq load or stream them through the tabledata.insertAll API.

Written by [Pouya Sanooei](https://cli.nylas.com/authors/pouya-sanooei) Software Engineer

Reviewed by [Qasim Muhammad](https://cli.nylas.com/authors/qasim-muhammad)

Updated June 9, 2026

> **TL;DR:** Export messages with `nylas email search "*" --json`, reshape each message into a flat BigQuery row with `jq`, and load it with `bq load` for batch backfills. For continuous intake, switch to the streaming `tabledata.insertAll` API — the one choice that decides your per-row cost and freshness is covered below.

Command references used in this guide: [`nylas email search`](https://cli.nylas.com/docs/commands/email-search), [`nylas email list`](https://cli.nylas.com/docs/commands/email-list), and [`nylas email read`](https://cli.nylas.com/docs/commands/email-read).

## How do I export email as JSON for BigQuery?

Export email by running `nylas email search "*" --json`, which returns an array of message objects with subject, sender, date, and snippet fields. The CLI authenticates through a stored grant and works the same across Gmail, Outlook, and four other providers, so you never write a Graph or Gmail API call. A single search auto-paginates past 200 results when you raise the limit.

The `email search` command takes a query string plus filters such as `--after` and `--from`. Scope each export to a date window so a backfill stays bounded and a daily run only touches the last 24 hours of mail. Writing the raw JSON to disk first keeps the export and the transform as separate, re-runnable steps.

```bash
# Export the last 24 hours of mail as a JSON array
nylas email search "*" --after 2026-06-08 --json --limit 500 > raw.json

# Confirm the shape: count the messages exported
jq 'length' raw.json
```

## How do I shape email JSON into BigQuery rows with jq?

Shape the export into BigQuery rows by piping it through `jq -c` to emit newline-delimited JSON, one object per line. BigQuery's `NEWLINE_DELIMITED_JSON` format expects exactly that, and each key must match a column name in your table schema. Flatten nested arrays such as the sender list down to a single email string.

The message `date` arrives as a Unix epoch in seconds; multiply by 1000 only if your column is a millisecond timestamp, otherwise load it as an `INT64` and convert in SQL. Use `//` fallbacks so a missing subject becomes an empty string rather than a null that breaks a NOT NULL column. The transform syntax follows the official [jq manual](https://jqlang.github.io/jq/manual/).

```bash
# Flatten each message into one row per line (NDJSON)
jq -c '.[] | {
  id:      .id,
  thread:  .thread_id,
  subject: (.subject // ""),
  sender:  (.from[0].email // ""),
  ts:      .date,
  snippet: (.snippet // "")
}' raw.json > rows.ndjson
```

## How do I load email rows with bq load?

Load the NDJSON file in one batch with `bq load`, the BigQuery CLI command for ingesting local files. Point it at `dataset.table`, set the source format to NDJSON, and either pass an explicit schema or autodetect it. Batch loads are free of insert charges — you pay only for storage and the queries you run later.

A local file load caps at 10 MB through the CLI before you need a Cloud Storage staging path, which covers thousands of email rows per run. The `bq` tool ships with the Google Cloud SDK; the flags below follow the [BigQuery local-load docs](https://cloud.google.com/bigquery/docs/loading-data-local). Re-running the load appends by default, so de-duplicate on the message `id` in your queries.

```bash
# Batch-load the NDJSON file into a BigQuery table
bq load \
  --source_format=NEWLINE_DELIMITED_JSON \
  --autodetect \
  mailbox.messages \
  rows.ndjson
```

## How do I stream email into BigQuery in real time?

Stream email by POSTing rows to the `tabledata.insertAll` API instead of batch-loading a file. Each request carries up to 10 MB and 500 rows, and the data is queryable within seconds rather than after a load job completes. Pair it with a webhook so a new message lands in BigQuery the moment it arrives.

Streaming inserts cost per 200 MB ingested, unlike free batch loads, so reserve this path for low-latency needs. Send an`insertId` per row and BigQuery de-duplicates best-effort over a few minutes; use the message `id` for that field. The request body is documented in the [tabledata.insertAll reference](https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll); quota limits are in the [BigQuery quotas page](https://cloud.google.com/bigquery/quotas).

```bash
# Build an insertAll body from one exported message and stream it
PROJECT="your-project"; DATASET="mailbox"; TABLE="messages"
jq -c '{rows: [.[] | {insertId: .id, json: {
  id: .id, subject: (.subject // ""), sender: (.from[0].email // ""), ts: .date
}}]}' raw.json > insert.json

curl -s -X POST \
  "https://bigquery.googleapis.com/bigquery/v2/projects/$PROJECT/datasets/$DATASET/tables/$TABLE/insertAll" \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  -d @insert.json
```

## Why schedule the export instead of loading once?

Schedule the export because email is append-only data that grows every day, and a one-time load goes stale within hours. A cron job that runs the search, the jq transform, and the load keeps a BigQuery table current without re-importing the whole mailbox. Scoping each run to `--after` yesterday keeps the daily payload under a few hundred rows.

Run a daily cron for analytics that tolerate a 24-hour lag, and switch the same transform to the streaming path when a dashboard needs sub-minute freshness. The export and load steps are independent, so a failed load can re-run against the saved `raw.json` without re-querying the mailbox. Store the BigQuery credentials separately from the mailbox grant so you rotate each on its own cadence.

```bash
# Daily export-shape-load, suitable for crontab at 06:00
nylas email search "*" --after "$(date -v-1d +%F)" --json --limit 1000 \
  | jq -c '.[] | {id, subject: (.subject // ""), sender: (.from[0].email // ""), ts: .date}' \
  > rows.ndjson
bq load --source_format=NEWLINE_DELIMITED_JSON --autodetect mailbox.messages rows.ndjson
```

## Next steps

- [Load email into Postgres](https://cli.nylas.com/guides/email-to-postgres) — the same export-shape-load pattern into a relational database
- [Export email to CSV](https://cli.nylas.com/guides/email-to-csv-export) — a flat file instead of a warehouse
- [Sync email to S3](https://cli.nylas.com/guides/sync-email-to-s3) — archive the raw JSON before loading
- [Email to Zendesk tickets](https://cli.nylas.com/guides/email-to-zendesk-tickets) — route messages instead of warehousing them
- [Email to Jira issues](https://cli.nylas.com/guides/email-to-jira-issues) — turn inbound mail into tracked work
- [Full command reference](https://cli.nylas.com/docs/commands) — every flag and subcommand documented
- [BigQuery bq CLI reference](https://cloud.google.com/bigquery/docs/reference/bq-cli-reference) — every bq load flag