Guide
Load Email into Snowflake (CLI)
Analysts want email in the warehouse next to orders, tickets, and events — not locked in a mailbox. Managed email connectors cost per row and lag by hours. The Nylas CLI exports each message as JSON; Snowflake's VARIANT type stores the whole document and lets you query nested fields with dot notation. This guide stages exported JSON and runs COPY INTO with snowsql or the SQL API, so a daily load keeps the table current without a paid connector.
Written by Nick Barraclough Product Manager
Command references used in this guide: nylas email search, nylas email list, and nylas email read.
How do you export email as JSON for Snowflake?
Export email for Snowflake by pulling messages as structured JSON, one document per message, with nylas email search --json. The CLI returns a JSON array with fields like id, subject, from, and date already typed, so no SMTP or IMAP parsing is needed. It auto-paginates past 200 results, so a single command can export a full month of mail.
Snowflake's COPY INTO loads newline-delimited JSON more predictably than one giant array, so split the export into one object per line. The jq -c filter does this in a single pass. A 50-message export lands in well under a second, and the resulting NDJSON file stages cleanly because each row is an independent document. Scope the search to a window like --after 2026-06-01 so each run handles a bounded set rather than the whole mailbox.
# Export recent messages, one JSON object per line (NDJSON)
nylas email search "*" --after 2026-06-01 --json --limit 500 \
| jq -c '.[]' > messages.ndjson
wc -l messages.ndjson # one row per messageWhat table and stage does Snowflake need?
Snowflake needs one table with a VARIANT column plus an internal named stage and a JSON file format. The VARIANT type stores a whole semi-structured document and lets you read nested fields with dot notation, so you never declare a column per email field. This is documented in the Snowflake semi-structured data types reference.
Create the objects once with any SQL client. A single VARIANT column named raw holds each message; a TIMESTAMP_NTZ default records load time for auditing. The named stage is internal, so files live inside Snowflake and need no S3 bucket. The file format sets TYPE = JSON and STRIP_OUTER_ARRAY = TRUE, which lets one statement load either an NDJSON file or a JSON array, within Snowflake's per-row VARIANT size limit.
CREATE TABLE IF NOT EXISTS email_raw (
raw VARIANT,
loaded_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
);
CREATE STAGE IF NOT EXISTS email_stage;
CREATE FILE FORMAT IF NOT EXISTS json_ndjson
TYPE = JSON
STRIP_OUTER_ARRAY = TRUE;How do you stage the file and COPY INTO the table?
Stage the file with PUT, then load it with COPY INTO reading the JSON file format. PUT uploads the local NDJSON to the internal stage and gzip-compresses it in transit by default; COPY INTO reads every staged file and writes one row per JSON document into the VARIANT column. The full grammar is in the Snowflake COPY INTO table reference.
Run both statements from snowsql, Snowflake's official command-line client, in one -q invocation. The PUT command is client-side and only works from snowsql or a driver, not the web UI, per the local file system load guide. Add ON_ERROR = CONTINUE so one malformed message does not abort a 500-row batch; Snowflake reports skipped rows in the load summary.
snowsql -a "$SF_ACCOUNT" -u "$SF_USER" -d ANALYTICS -s PUBLIC -q "
PUT file://messages.ndjson @email_stage AUTO_COMPRESS=TRUE OVERWRITE=TRUE;
COPY INTO email_raw (raw)
FROM @email_stage
FILE_FORMAT = (FORMAT_NAME = json_ndjson)
ON_ERROR = CONTINUE;
"Once loaded, query nested fields directly off the VARIANT. Dot notation reaches into the document, and :: casts the result to a SQL type for filtering or joins. The query below pulls subject and sender for the most recent 20 messages.
SELECT raw:subject::string AS subject,
raw:from[0].email::string AS sender,
TO_TIMESTAMP(raw:date::int) AS received
FROM email_raw
ORDER BY received DESC
LIMIT 20;When should you use the SQL API instead of snowsql?
Use the Snowflake SQL API instead of snowsql when the load runs unattended in a container or CI job where installing a CLI is awkward. The SQL API is a REST endpoint that runs statements over HTTPS with a signed JWT, returning results as JSON. Its contract is documented in the Snowflake SQL API guide.
The SQL API cannot run client-side PUT, so stage files through an external stage on S3 or GCS, then issue only the COPY INTO over REST. A single POST to the /api/v2/statements path on your account host runs the load; the response includes a statement handle you poll for completion. For a load that already pushes JSON to a bucket – see the S3 export guide below – this skips the CLI entirely and keeps the job to one HTTP call.
curl -s -X POST \
"https://$SF_ACCOUNT.snowflakecomputing.com/api/v2/statements" \
-H "Authorization: Bearer $SF_JWT" \
-H "Content-Type: application/json" \
-d '{
"statement": "COPY INTO email_raw (raw) FROM @ext_email_stage FILE_FORMAT = (FORMAT_NAME = json_ndjson) ON_ERROR = CONTINUE",
"warehouse": "COMPUTE_WH",
"database": "ANALYTICS",
"schema": "PUBLIC"
}'How do you keep the table in sync without duplicates?
Keep the table in sync by loading into a staging table and merging on the message ID, so re-runs never duplicate rows. Each Nylas message carries a stable id, and a MERGE statement keyed on raw:id inserts new messages while skipping ones already present. Running the export with --after scoped to the last day keeps each batch small.
Load fresh JSON into email_stage_tbl with COPY INTO, then merge into email_raw. The MERGE matches on the extracted ID and only inserts unmatched rows, so a daily cron that re-imports an overlapping 24-hour window stays idempotent. A 500-message daily load merges in well under a second on an extra-small warehouse, and you pay only for the seconds the warehouse runs.
MERGE INTO email_raw t
USING (
SELECT raw, raw:id::string AS msg_id FROM email_stage_tbl
) s
ON t.raw:id::string = s.msg_id
WHEN NOT MATCHED THEN
INSERT (raw) VALUES (s.raw);Next steps
- Load email into BigQuery — the same export pattern into Google's warehouse
- Load email into Postgres — JSONB and typed columns with psql
- Sync email to S3 — stage JSON in a bucket for an external Snowflake stage
- Email to GitHub Issues — route messages to a tracker instead of a warehouse
- Email to monday.com items — the same JSON-to-API pattern into a board
- snowsql CLI reference — install and connection options for the client used here
- Gmail API guides and Microsoft Graph message resource — the provider message models behind the JSON
- Full command reference — every flag and subcommand documented