Guide

Monitor Agent Account Health

An agent account that silently goes invalid takes its agent down with it — the next send fails, the next inbox read returns nothing. Catching that before the agent does means probing health on a schedule, not by hand. This guide builds a health monitor from the CLI: read connector and account status as JSON, flag any account whose grant_status isn't valid, alert a human when one breaks, and track the healthy count over time. The whole probe is a cron job and a few lines of jq.

Written by Caleb Geene Director, Site Reliability Engineering

VerifiedCLI 3.1.16 · Nylas managed · last tested June 7, 2026

What does monitoring agent account health cover?

Monitoring agent account health means continuously answering one question: can every agent still send and receive? Two things can break that — the managed connector behind all accounts, or an individual account's grant going invalid. The CLI surfaces both as structured data, so a health monitor is a scheduled probe that reads status, compares it to the expected valid state, and raises an alert on any drift.

The reason to automate this is that a broken grant fails silently. An agent doesn't announce that its account went invalid; it just starts failing sends, and you find out when a customer doesn't get a reply. Probing health on a schedule turns a silent failure into a page — the difference between catching a broken account in minutes and discovering it from a complaint.

Health loop: probe status on a schedule, evaluate grant_status, alert when an account is not validProbe (cron)agent status --jsonEvaluategrant_status == valid?valid → quietinvalid → alert

How do I check connector and account health?

The nylas agent status --json command is the single health endpoint. It returns connector_configured (the managed connector behind every account), account_count, and an accounts array where each entry carries a grant_status. A healthy account reads valid:

nylas agent status --json \
  | jq '{connector: .connector_configured, accounts: .account_count}'

Check the connector first, because if connector_configured is false, every account is affected and per-account checks are noise. Treat that as the top-level alarm. When the connector is healthy, drill into individual accounts. This pairs with the broader email integration health monitoring patterns for the rest of your Nylas setup.

How do I find unhealthy accounts across the fleet?

With a fleet of accounts, you want the exceptions, not the full list. Both nylas agent status --json and nylas agent account list --json expose grant_status per account. Filter with jq for anything that isn't valid and you get just the broken ones:

# Print each account that isn't valid, with its grant_status
nylas agent account list --json \
  | jq -r '.[] | select(.grant_status != "valid") | "\(.email): \(.grant_status)"'

An empty result means the fleet is healthy — the quiet success a monitor should produce most of the time. A non-empty result is your work queue: each line is an account email and its bad grant_status. Feed that list into an alert or a remediation step. The per-account fix — rotating a credential or recreating the grant — is covered in the agent account lifecycle guide.

How do I run the health check on a schedule?

A health check that runs by hand isn't monitoring. Wrap the probe in a script and schedule it with cron — every 5 minutes is a reasonable cadence for catching a broken grant fast without hammering the API. The script also runs nylas doctor to confirm the CLI's own credentials are valid before it trusts any status output:

#!/usr/bin/env bash
set -euo pipefail

# Confirm the CLI itself is authenticated before trusting status
nylas doctor >/dev/null

# Top-level: is the managed connector up?
if [ "$(nylas agent status --json | jq -r '.connector_configured')" != "true" ]; then
  echo "ALERT: agent connector is not configured" >&2
  exit 1
fi

# Per-account: anything not valid?
bad=$(nylas agent account list --json \
  | jq -r '.[] | select(.grant_status != "valid") | .email')

if [ -n "$bad" ]; then
  echo "ALERT: unhealthy agent accounts:" >&2
  echo "$bad" >&2
  exit 2
fi

echo "ok: all agent accounts valid"

The script returns distinct exit codes — 0 healthy, 1 connector down, 2 account broken — so cron, a CI job, or a wrapper can treat a connector outage differently from a single bad grant. A 5-minute schedule means a broken account is detected within 5 minutes of failing, not whenever someone next looks.

# crontab: run the health check every 5 minutes
*/5 * * * * /usr/local/bin/agent-health-check.sh || /usr/local/bin/notify.sh

How do I alert when an account goes invalid?

Detection is only useful if it reaches a human. When the probe finds a bad account, send the alert from a healthy agent account with nylas email send — the same CLI doing the monitoring delivers the page. Good alerts are actionable and rare: name the broken account, say what changed, and don't fire when everything is fine.

bad=$(nylas agent account list --json \
  | jq -r '.[] | select(.grant_status != "valid") | "\(.email): \(.grant_status)"')

if [ -n "$bad" ]; then
  nylas email send \
    --to oncall@yourcompany.com \
    --subject "Agent account health: $(echo "$bad" | wc -l | tr -d ' ') account(s) invalid" \
    --body "$bad"
fi

Sending only when $bad is non-empty is what keeps the alert meaningful — a monitor that emails "all healthy" every 5 minutes trains people to ignore it. The Google SRE practice of alerting on symptoms, not noise, is documented in Monitoring Distributed Systems: page on what's broken and actionable, stay quiet otherwise.

How do I track health over time?

A point-in-time check tells you the fleet is healthy now; a trend tells you it's degrading. Append a timestamped count of valid versus total accounts to a log on every run, and you have the raw data for a dashboard or a weekly review. The valid ratio is the one number that summarizes fleet health:

status=$(nylas agent status --json)
total=$(echo "$status" | jq '.account_count')
valid=$(echo "$status" | jq '[.accounts[] | select(.grant_status == "valid")] | length')
echo "$(date -u +%FT%TZ) valid=$valid total=$total" >> agent-health.log

Over weeks, this log shows whether the valid ratio holds at 100% or drifts down as grants expire — the kind of slow degradation a single check never reveals. Plot it, or just grep for any line where valid dropped below total. For agent-action error rates alongside account health, see auditing AI agent activity.

Which check runs on which schedule?

Not every check belongs on the same cadence. The connector probe is cheap and high-value, so it runs often; the trend log is a once-a-run append; a deeper reconciliation against your tenant list can run nightly. Match frequency to how fast each signal changes:

CheckCommandCadence
Connector upagent status --jsonEvery 5 minutes
Account grant_statusagent account list --jsonEvery 5 minutes
Valid-ratio trendagent status --json → logEvery run (append)
Tenant reconciliationagent account list --jsonNightly

Keep the 5-minute checks lightweight so they never become the thing that breaks. The nightly reconciliation — diffing provisioned accounts against your tenant source of truth — belongs with the provisioning-at-scale workflow, which already lists the fleet as JSON.

Next steps