Source: https://cli.nylas.com/guides/email-to-elasticsearch

# Index Email into Elasticsearch (CLI)

Mailbox search is keyword-shallow: it can't rank by relevance, highlight matches, or aggregate across years of mail. Elasticsearch does all three. The Nylas CLI gives you each message as JSON; jq shapes it into a document; the _bulk API indexes thousands at once. This guide builds an email-to-Elasticsearch pipeline with a mapping tuned for full-text search, a bulk loader keyed on the message ID so re-runs stay idempotent, and match queries that highlight hits.

Written by [Qasim Muhammad](https://cli.nylas.com/authors/qasim-muhammad) Staff SRE

Updated June 9, 2026

> **TL;DR:** Pull messages with `nylas email search --json`, reshape each one into an index action with `jq`, and POST the stream to the Elasticsearch `_bulk` endpoint. Key each document on the message ID so re-runs update in place instead of duplicating. The CLI handles the inbox across six providers; Elasticsearch handles relevance-ranked full-text search — the payoff a mailbox can't give you, covered in the query section below.

Command references used in this guide: [`nylas email search`](https://cli.nylas.com/docs/commands/email-search), [`nylas email list`](https://cli.nylas.com/docs/commands/email-list), and [`nylas email read`](https://cli.nylas.com/docs/commands/email-read).

## What index mapping should email use?

An email index mapping defines how Elasticsearch tokenizes and stores each field. Use `text` for the subject and body so the analyzer breaks them into searchable tokens, and `keyword` for sender and message ID so they match exactly and aggregate. The date field uses `date` for range filters.

The distinction matters for relevance. A `text` field runs through the standard analyzer, which lowercases and splits on word boundaries, so a search for “invoice” matches “Invoice #4012”. A `keyword` field is stored verbatim for exact term filters and sender facets. The [Elasticsearch text field docs](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html) describe this analysis step. Create the index once with an explicit mapping; an index that auto-detects types from the first document often guesses wrong, indexing the date as a string and breaking range queries.

```bash
# Create the index with an explicit mapping (run once)
curl -s -X PUT "$ES_URL/email" \
  -H "Content-Type: application/json" \
  -d '{
    "mappings": {
      "properties": {
        "subject":   { "type": "text" },
        "body":      { "type": "text" },
        "sender":    { "type": "keyword" },
        "thread_id": { "type": "keyword" },
        "date":      { "type": "date", "format": "epoch_second" }
      }
    }
  }'
```

## How do you shape a message into a bulk document?

The Elasticsearch `_bulk` API expects newline-delimited JSON: one action line, then one document line, repeated. Pull messages with `nylas email search --json` and pipe the array through `jq` to emit that two-line pattern per message, setting the document `_id` to the message ID.

Keying on the message ID is what keeps the loader idempotent. The CLI returns a stable ID per message, so re-indexing the same mail updates the existing document rather than creating a duplicate, exactly as the [Elasticsearch bulk API reference](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html) specifies for the `index` action. The CLI returns the date as a Unix epoch integer, which matches the `epoch_second` format set in the mapping. The `--limit` flag auto-paginates above 200, so one command can pull a year of mail.

```bash
# Pull a day of mail and reshape into _bulk NDJSON
nylas email search "*" --after 2026-06-08 --json --limit 500 \
  | jq -c '.[] | (
      { index: { _index: "email", _id: .id } },
      { subject:   (.subject // ""),
        body:      (.snippet // ""),
        sender:    (.from[0].email // ""),
        thread_id: .thread_id,
        date:      .date }
    )' > bulk.ndjson
```

## How do you bulk-index into Elasticsearch?

You bulk-index by POSTing the NDJSON file to the `_bulk` endpoint with a newline-aware content type. A single request indexes the whole batch in one round trip, which is far faster than one HTTP call per document. Elastic recommends bulk batches of roughly 5–15 MB; at typical email sizes that is a few thousand messages per request.

The `Content-Type: application/x-ndjson` header is required — the bulk endpoint rejects a normal JSON body. The final line of the file must end with a newline, or Elasticsearch drops the last action. Parse the response for an `errors` field: a `true` value means at least one document in the batch failed while the rest succeeded, since bulk is not transactional. Pipe the result through `jq` to count failures before moving on.

```bash
# Index the batch and check for partial failures
curl -s -X POST "$ES_URL/_bulk" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary @bulk.ndjson \
  | jq '{ errors, indexed: (.items | length) }'
```

## How do you run full-text search on the indexed mail?

Run full-text search with a `match` query against the `_search` endpoint. Unlike a mailbox keyword search, Elasticsearch scores every hit by relevance, so the most pertinent message ranks first, and the highlighter returns the matched fragment. This is the capability the TL;DR promised that no mailbox offers.

A `match` query analyzes the search text the same way the field was analyzed at index time, so “refund requests” matches documents containing either word, ranked by a BM25 score that weighs how often and how rarely each term appears. The `size` parameter caps the response at 10 hits by default. The [match query reference](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html) documents the analyzer behavior. Add a `highlight` block to get the matched fragment back, and combine the text match with a `range` filter on `date` to scope results to a window. Aggregations over the `sender` keyword field then answer questions a mailbox can't, like top senders for any search term.

```bash
# Relevance-ranked search with highlighted matches
curl -s -X GET "$ES_URL/email/_search" \
  -H "Content-Type: application/json" \
  -d '{
    "query": { "match": { "subject": "invoice overdue" } },
    "highlight": { "fields": { "subject": {} } },
    "size": 10
  }' | jq '.hits.hits[] | { score: ._score, subject: ._source.subject }'
```

## How do you keep the index in sync with new mail?

Keep the index current by scheduling the pull-shape-index pipeline and scoping each run to recent mail. Run it daily with `--after` set to yesterday so each pass handles only the last 24 hours of messages, typically a few hundred, instead of re-indexing the whole mailbox every night.

Because every document is keyed on the message ID, an overlapping window is safe: a message that appears in two runs updates the same document rather than duplicating it. For near-real-time indexing, drive the same shape-and-index step from a [message webhook](https://cli.nylas.com/guides/parse-inbound-email-webhooks) so a new message lands in Elasticsearch within seconds of arriving. The Elasticsearch credentials stay separate from the mailbox grant the CLI manages, so rotate them on independent schedules. A single cron line wires the whole loop — pull, reshape, bulk-index — into a self-maintaining searchable mirror of the inbox.

```bash
# Daily cron: index yesterday's mail, keyed by message ID
0 6 * * *  nylas email search "*" --after $(date -d yesterday +\%Y-\%m-\%d) --json --limit 1000 \
  | jq -c '.[] | ({ index: { _index: "email", _id: .id } }, { subject: .subject, sender: .from[0].email, date: .date })' \
  | curl -s -X POST "$ES_URL/_bulk" -H "Content-Type: application/x-ndjson" --data-binary @-
```

## Next steps

- [Load email into Postgres](https://cli.nylas.com/guides/email-to-postgres) — a relational store with JSONB and upserts
- [Load email into Snowflake](https://cli.nylas.com/guides/email-to-snowflake) — the warehouse pattern with VARIANT columns
- [Sync email to S3](https://cli.nylas.com/guides/sync-email-to-s3) — archive raw JSON to object storage first
- [Load email into SQLite](https://cli.nylas.com/guides/email-to-sqlite) — a local file database for smaller sets
- [Load email into MySQL](https://cli.nylas.com/guides/email-to-mysql) — the same loader against MySQL
- [Full command reference](https://cli.nylas.com/docs/commands) — every flag and subcommand documented
- [Elasticsearch bulk API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html) — NDJSON format and action semantics
- [Create index API](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html) — mapping options at index creation
- [Match query reference](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html) — full-text query analysis and scoring
