Source: https://cli.nylas.com/guides/email-to-snowflake

# Load Email into Snowflake (CLI)

Analysts want email in the warehouse next to orders, tickets, and events — not locked in a mailbox. Managed email connectors cost per row and lag by hours. The Nylas CLI exports each message as JSON; Snowflake's VARIANT type stores the whole document and lets you query nested fields with dot notation. This guide stages exported JSON and runs COPY INTO with snowsql or the SQL API, so a daily load keeps the table current without a paid connector.

Written by [Nick Barraclough](https://cli.nylas.com/authors/nick-barraclough) Product Manager

Updated June 9, 2026

> **TL;DR:** Export messages with `nylas email search --json`, stage the file with `PUT`, then run `COPY INTO` a single `VARIANT` column so Snowflake stores each message whole and you query nested fields with dot notation. Drive it from the `snowsql` CLI for a one-shot load or the SQL API for an unattended job. One trick keeps re-runs from duplicating rows – covered in the sync section below.

Command references used in this guide: [`nylas email search`](https://cli.nylas.com/docs/commands/email-search), [`nylas email list`](https://cli.nylas.com/docs/commands/email-list), and [`nylas email read`](https://cli.nylas.com/docs/commands/email-read).

## How do you export email as JSON for Snowflake?

Export email for Snowflake by pulling messages as structured JSON, one document per message, with `nylas email search --json`. The CLI returns a JSON array with fields like `id`, `subject`, `from`, and `date` already typed, so no SMTP or IMAP parsing is needed. It auto-paginates past 200 results, so a single command can export a full month of mail.

Snowflake's `COPY INTO` loads newline-delimited JSON more predictably than one giant array, so split the export into one object per line. The `jq -c` filter does this in a single pass. A 50-message export lands in well under a second, and the resulting NDJSON file stages cleanly because each row is an independent document. Scope the search to a window like `--after 2026-06-01` so each run handles a bounded set rather than the whole mailbox.

```bash
# Export recent messages, one JSON object per line (NDJSON)
nylas email search "*" --after 2026-06-01 --json --limit 500 \
  | jq -c '.[]' > messages.ndjson

wc -l messages.ndjson   # one row per message
```

## What table and stage does Snowflake need?

Snowflake needs one table with a `VARIANT` column plus an internal named stage and a JSON file format. The `VARIANT` type stores a whole semi-structured document and lets you read nested fields with dot notation, so you never declare a column per email field. This is documented in the [Snowflake semi-structured data types reference](https://docs.snowflake.com/en/sql-reference/data-types-semistructured).

Create the objects once with any SQL client. A single `VARIANT` column named `raw` holds each message; a `TIMESTAMP_NTZ` default records load time for auditing. The named stage is internal, so files live inside Snowflake and need no S3 bucket. The file format sets `TYPE = JSON` and `STRIP_OUTER_ARRAY = TRUE`, which lets one statement load either an NDJSON file or a JSON array, within Snowflake's per-row VARIANT size limit.

```sql
CREATE TABLE IF NOT EXISTS email_raw (
  raw       VARIANT,
  loaded_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
);

CREATE STAGE IF NOT EXISTS email_stage;

CREATE FILE FORMAT IF NOT EXISTS json_ndjson
  TYPE = JSON
  STRIP_OUTER_ARRAY = TRUE;
```

## How do you stage the file and COPY INTO the table?

Stage the file with `PUT`, then load it with `COPY INTO` reading the JSON file format. `PUT` uploads the local NDJSON to the internal stage and gzip-compresses it in transit by default; `COPY INTO` reads every staged file and writes one row per JSON document into the `VARIANT` column. The full grammar is in the [Snowflake COPY INTO table reference](https://docs.snowflake.com/en/sql-reference/sql/copy-into-table).

Run both statements from `snowsql`, Snowflake's official command-line client, in one `-q` invocation. The `PUT` command is client-side and only works from `snowsql` or a driver, not the web UI, per the [local file system load guide](https://docs.snowflake.com/en/user-guide/data-load-local-file-system). Add `ON_ERROR = CONTINUE` so one malformed message does not abort a 500-row batch; Snowflake reports skipped rows in the load summary.

```bash
snowsql -a "$SF_ACCOUNT" -u "$SF_USER" -d ANALYTICS -s PUBLIC -q "
  PUT file://messages.ndjson @email_stage AUTO_COMPRESS=TRUE OVERWRITE=TRUE;
  COPY INTO email_raw (raw)
    FROM @email_stage
    FILE_FORMAT = (FORMAT_NAME = json_ndjson)
    ON_ERROR = CONTINUE;
"
```

Once loaded, query nested fields directly off the `VARIANT`. Dot notation reaches into the document, and `::` casts the result to a SQL type for filtering or joins. The query below pulls subject and sender for the most recent 20 messages.

```sql
SELECT raw:subject::string          AS subject,
       raw:from[0].email::string    AS sender,
       TO_TIMESTAMP(raw:date::int)  AS received
FROM   email_raw
ORDER  BY received DESC
LIMIT  20;
```

## When should you use the SQL API instead of snowsql?

Use the Snowflake SQL API instead of `snowsql` when the load runs unattended in a container or CI job where installing a CLI is awkward. The SQL API is a REST endpoint that runs statements over HTTPS with a signed JWT, returning results as JSON. Its contract is documented in the [Snowflake SQL API guide](https://docs.snowflake.com/en/developer-guide/sql-api/index).

The SQL API cannot run client-side `PUT`, so stage files through an external stage on S3 or GCS, then issue only the `COPY INTO` over REST. A single POST to the `/api/v2/statements` path on your account host runs the load; the response includes a statement handle you poll for completion. For a load that already pushes JSON to a bucket – see the S3 export guide below – this skips the CLI entirely and keeps the job to one HTTP call.

```bash
curl -s -X POST \
  "https://$SF_ACCOUNT.snowflakecomputing.com/api/v2/statements" \
  -H "Authorization: Bearer $SF_JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "statement": "COPY INTO email_raw (raw) FROM @ext_email_stage FILE_FORMAT = (FORMAT_NAME = json_ndjson) ON_ERROR = CONTINUE",
    "warehouse": "COMPUTE_WH",
    "database": "ANALYTICS",
    "schema": "PUBLIC"
  }'
```

## How do you keep the table in sync without duplicates?

Keep the table in sync by loading into a staging table and merging on the message ID, so re-runs never duplicate rows. Each Nylas message carries a stable `id`, and a `MERGE` statement keyed on `raw:id` inserts new messages while skipping ones already present. Running the export with `--after` scoped to the last day keeps each batch small.

Load fresh JSON into `email_stage_tbl` with `COPY INTO`, then merge into `email_raw`. The `MERGE` matches on the extracted ID and only inserts unmatched rows, so a daily cron that re-imports an overlapping 24-hour window stays idempotent. A 500-message daily load merges in well under a second on an extra-small warehouse, and you pay only for the seconds the warehouse runs.

```sql
MERGE INTO email_raw t
USING (
  SELECT raw, raw:id::string AS msg_id FROM email_stage_tbl
) s
ON t.raw:id::string = s.msg_id
WHEN NOT MATCHED THEN
  INSERT (raw) VALUES (s.raw);
```

## Next steps

- [Load email into BigQuery](https://cli.nylas.com/guides/email-to-bigquery) — the same export pattern into Google's warehouse
- [Load email into Postgres](https://cli.nylas.com/guides/email-to-postgres) — JSONB and typed columns with psql
- [Sync email to S3](https://cli.nylas.com/guides/sync-email-to-s3) — stage JSON in a bucket for an external Snowflake stage
- [Email to GitHub Issues](https://cli.nylas.com/guides/email-to-github-issues) — route messages to a tracker instead of a warehouse
- [Email to monday.com items](https://cli.nylas.com/guides/email-to-monday-items) — the same JSON-to-API pattern into a board
- [snowsql CLI reference](https://docs.snowflake.com/en/user-guide/snowsql) — install and connection options for the client used here
- [Gmail API guides](https://developers.google.com/workspace/gmail/api/guides) and [Microsoft Graph message resource](https://learn.microsoft.com/en-us/graph/api/resources/message) — the provider message models behind the JSON
- [Full command reference](https://cli.nylas.com/docs/commands) — every flag and subcommand documented
