Guide

Monitor Email Integration Health from Terminal

An email integration degrades quietly: a grant expires, a webhook endpoint starts returning 500s, an error rate creeps from 0.1% to 4%. Users notice before you do unless something is checking. This guide builds a monitoring stack from the terminal: scheduled health checks with nylas doctor, error-rate tracking from local audit logs, webhook endpoint tests, and an alert path for when a check fails — all in cron, no monitoring service required.

Written by Qasim Muhammad Staff SRE

Updated June 6, 2026

Verified — CLI 3.1.16 · Gmail · last tested June 5, 2026

Command references used in this guide: nylas doctor, nylas audit logs show, nylas webhook test, and nylas webhook list.

What does email integration health monitoring cover?

Email integration monitoring covers four surfaces: credential validity (API keys and OAuth grants), API reachability (network path and latency), webhook delivery (is your endpoint receiving events), and command-level error rates (what fraction of calls fail). Each surface fails independently, so a single "is it up" ping misses three of the four. The split mirrors the four golden signals in the Google SRE book's monitoring chapter (latency, traffic, errors, saturation), scoped down to what one integration needs.

The grant surface deserves the most attention because it fails on its own schedule. A user changes their password, an admin revokes an app, a provider forces re-consent, and that one mailbox silently stops syncing while the other 99 work fine. Catching it within 15 minutes instead of 3 days is the difference between a proactive email to one user and a support escalation. For what the platform guarantees on its side of this line, see the SLA guide.

How do you run a scheduled health check?

The nylas doctor --json command runs 5 diagnostic checks (configuration, secret store, API credentials, network, grants) and finishes in a few seconds, which makes it cheap enough to schedule every 15 minutes. Each check reports a status field, so a one-line jq filter turns the output into a pass/fail gate for cron.

#!/usr/bin/env bash
# health-check.sh — alert if any doctor check fails
# (no set -e: a failing check must reach the alert, not kill the script)
set -uo pipefail

UNHEALTHY=$(nylas doctor --json | jq '[.checks[] | select(.status != "ok")]') \
  || UNHEALTHY='[{"name":"doctor","status":"error","message":"doctor failed to run"}]'

if [ "$(echo "$UNHEALTHY" | jq 'length')" -gt 0 ]; then
  nylas email send \
    --to oncall@example.com \
    --subject "Email integration check failed on $(hostname)" \
    --body "$UNHEALTHY" \
    --yes
fi

# Cron runs with a minimal PATH — include the directory nylas lives in
PATH=/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin

# Run every 15 minutes
*/15 * * * * /usr/local/bin/health-check.sh >> /var/log/email-health.log 2>&1

A 15-minute interval means a failed grant is caught within 96 checks a day at negligible cost: each run is one local command and zero message sends while healthy. Tighten to 5 minutes for customer-facing integrations; that's still only 288 checks a day.

How do you track error rates from audit logs?

The audit log turns command history into a local dataset your error-rate math can run on: each entry carries the command, its status, and its Nylas request ID. Enable it once with nylas audit init --enable; from then on, nylas audit logs show --status error filters to failures, and --since scopes the window. A weekly review takes about 2 minutes and surfaces degradation trends that single health checks can't see.

# One-time setup
nylas audit init --enable

# All failures this month
nylas audit logs show --status error --since 2026-06-01

# Failures in email commands specifically
nylas audit logs show --command email --status error

# Trace one failure end-to-end with its request ID
nylas audit logs show --request-id req_abc123

The signal to watch is the trend, not the absolute count. Five failures in a week of 2,000 commands is a 0.25% error rate and probably network noise; the same five failures all hitting one grant is a dying mailbox connection. The --command filter splits the two cases in one query. Request IDs link any entry to the platform's own logs, as covered in the audit activity guide.

How do you test webhook endpoints from the terminal?

Webhook delivery fails on your side of the contract: an expired TLS certificate, a deploy that broke the handler, a firewall rule. The nylas webhook test send command pushes a test event to any URL on demand, so you can verify the endpoint end-to-end in seconds instead of waiting for a real event to fail. nylas webhook list shows the registered destinations and their status.

# What's registered, and is it active?
nylas webhook list

# Push a test event at the endpoint right now
nylas webhook test send https://api.example.com/webhooks/nylas

# Preview the payload shape your handler must parse
nylas webhook test payload message.created

Run the test send after every deploy that touches the handler, and monthly otherwise: certificates expire on their own clock. If deliveries did fail, remember the platform retries non-200 responses up to 3 times with exponential backoff per the Nylas webhooks documentation, so a fast fix often loses nothing; the webhook events reference documents the retry contract.

How do you alert when the email channel itself is the problem?

Alerting through the channel you're monitoring has an obvious failure mode: if sends are broken, the alert about broken sends doesn't arrive. The fix is channel diversity. Send the alert through a second path, such as a Slack message or a different connected mailbox, so the failure of one path can't suppress its own alarm.

Slack works well as the second path because it shares none of the email pipeline's failure modes. The CLI ships a nylas slack command group for exactly this kind of cross-channel workflow, and the email-to-Slack guide covers the setup in about 10 minutes. The doctor-gated alert script above then needs one change: a Slack send next to the email send, or in place of it.

Which check runs on which schedule?

A complete monitoring stack for one integration is 4 scheduled jobs and roughly 25 lines of shell. The table below is the full schedule, in order of how fast each failure type bites.

Check	Command	Frequency
Credentials, grants, network	`nylas doctor --json`	Every 15 min
Command error rate	`nylas audit logs show --status error`	Weekly review
Webhook endpoint	`nylas webhook test send URL`	After deploys + monthly
Registered destinations	`nylas webhook list`	Monthly

This stack monitors your side of the integration. The platform's side is published at status.nylas.com, and when a check fails there rather than here, the playbook switches from fixing to queueing — covered in the outage handling guide.

Next steps

Monitor agent account health — probe connector and per-account grant_status for AI agent accounts
Handle email API outages gracefully — what to do when the check fails on the platform side
Build reliable email automation — make the jobs being monitored fail less in the first place
Email to Slack notifications — the second alert channel setup
Email webhook events reference — payload shapes and retry behavior for the endpoints you're testing
AI agent audit dashboard — per-agent command counts, error rates, and session replay from audit logs
Full command reference — every flag and subcommand documented