Guide

Index Email into Elasticsearch (CLI)

Mailbox search is keyword-shallow: it can't rank by relevance, highlight matches, or aggregate across years of mail. Elasticsearch does all three. The Nylas CLI gives you each message as JSON; jq shapes it into a document; the _bulk API indexes thousands at once. This guide builds an email-to-Elasticsearch pipeline with a mapping tuned for full-text search, a bulk loader keyed on the message ID so re-runs stay idempotent, and match queries that highlight hits.

Written by Qasim Muhammad Staff SRE

VerifiedCLI 3.1.17 · Gmail, Outlook · last tested June 9, 2026

Command references used in this guide: nylas email search, nylas email list, and nylas email read.

What index mapping should email use?

An email index mapping defines how Elasticsearch tokenizes and stores each field. Use text for the subject and body so the analyzer breaks them into searchable tokens, and keyword for sender and message ID so they match exactly and aggregate. The date field uses date for range filters.

The distinction matters for relevance. A text field runs through the standard analyzer, which lowercases and splits on word boundaries, so a search for “invoice” matches “Invoice #4012”. A keyword field is stored verbatim for exact term filters and sender facets. The Elasticsearch text field docs describe this analysis step. Create the index once with an explicit mapping; an index that auto-detects types from the first document often guesses wrong, indexing the date as a string and breaking range queries.

# Create the index with an explicit mapping (run once)
curl -s -X PUT "$ES_URL/email" \
  -H "Content-Type: application/json" \
  -d '{
    "mappings": {
      "properties": {
        "subject":   { "type": "text" },
        "body":      { "type": "text" },
        "sender":    { "type": "keyword" },
        "thread_id": { "type": "keyword" },
        "date":      { "type": "date", "format": "epoch_second" }
      }
    }
  }'

How do you shape a message into a bulk document?

The Elasticsearch _bulk API expects newline-delimited JSON: one action line, then one document line, repeated. Pull messages with nylas email search --json and pipe the array through jq to emit that two-line pattern per message, setting the document _id to the message ID.

Keying on the message ID is what keeps the loader idempotent. The CLI returns a stable ID per message, so re-indexing the same mail updates the existing document rather than creating a duplicate, exactly as the Elasticsearch bulk API reference specifies for the index action. The CLI returns the date as a Unix epoch integer, which matches the epoch_second format set in the mapping. The --limit flag auto-paginates above 200, so one command can pull a year of mail.

# Pull a day of mail and reshape into _bulk NDJSON
nylas email search "*" --after 2026-06-08 --json --limit 500 \
  | jq -c '.[] | (
      { index: { _index: "email", _id: .id } },
      { subject:   (.subject // ""),
        body:      (.snippet // ""),
        sender:    (.from[0].email // ""),
        thread_id: .thread_id,
        date:      .date }
    )' > bulk.ndjson

How do you bulk-index into Elasticsearch?

You bulk-index by POSTing the NDJSON file to the _bulk endpoint with a newline-aware content type. A single request indexes the whole batch in one round trip, which is far faster than one HTTP call per document. Elastic recommends bulk batches of roughly 5–15 MB; at typical email sizes that is a few thousand messages per request.

The Content-Type: application/x-ndjson header is required — the bulk endpoint rejects a normal JSON body. The final line of the file must end with a newline, or Elasticsearch drops the last action. Parse the response for an errors field: a true value means at least one document in the batch failed while the rest succeeded, since bulk is not transactional. Pipe the result through jq to count failures before moving on.

# Index the batch and check for partial failures
curl -s -X POST "$ES_URL/_bulk" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary @bulk.ndjson \
  | jq '{ errors, indexed: (.items | length) }'

How do you run full-text search on the indexed mail?

Run full-text search with a match query against the _search endpoint. Unlike a mailbox keyword search, Elasticsearch scores every hit by relevance, so the most pertinent message ranks first, and the highlighter returns the matched fragment. This is the capability the TL;DR promised that no mailbox offers.

A match query analyzes the search text the same way the field was analyzed at index time, so “refund requests” matches documents containing either word, ranked by a BM25 score that weighs how often and how rarely each term appears. The size parameter caps the response at 10 hits by default. The match query reference documents the analyzer behavior. Add a highlight block to get the matched fragment back, and combine the text match with a range filter on date to scope results to a window. Aggregations over the sender keyword field then answer questions a mailbox can't, like top senders for any search term.

# Relevance-ranked search with highlighted matches
curl -s -X GET "$ES_URL/email/_search" \
  -H "Content-Type: application/json" \
  -d '{
    "query": { "match": { "subject": "invoice overdue" } },
    "highlight": { "fields": { "subject": {} } },
    "size": 10
  }' | jq '.hits.hits[] | { score: ._score, subject: ._source.subject }'

How do you keep the index in sync with new mail?

Keep the index current by scheduling the pull-shape-index pipeline and scoping each run to recent mail. Run it daily with --after set to yesterday so each pass handles only the last 24 hours of messages, typically a few hundred, instead of re-indexing the whole mailbox every night.

Because every document is keyed on the message ID, an overlapping window is safe: a message that appears in two runs updates the same document rather than duplicating it. For near-real-time indexing, drive the same shape-and-index step from a message webhook so a new message lands in Elasticsearch within seconds of arriving. The Elasticsearch credentials stay separate from the mailbox grant the CLI manages, so rotate them on independent schedules. A single cron line wires the whole loop — pull, reshape, bulk-index — into a self-maintaining searchable mirror of the inbox.

# Daily cron: index yesterday's mail, keyed by message ID
0 6 * * *  nylas email search "*" --after $(date -d yesterday +\%Y-\%m-\%d) --json --limit 1000 \
  | jq -c '.[] | ({ index: { _index: "email", _id: .id } }, { subject: .subject, sender: .from[0].email, date: .date })' \
  | curl -s -X POST "$ES_URL/_bulk" -H "Content-Type: application/x-ndjson" --data-binary @-

Next steps