Guide

Build an AI Email Triage Agent

Email triage gets expensive when every unread message competes for focus. An AI agent can scan your inbox, classify each message by urgency, draft replies for the important ones, and clear the rest out of your unread queue. This guide walks through building one in Python using Nylas CLI as the email backend and any LLM (OpenAI, Anthropic, or local Ollama) as the brain. Works across all major email providers.

Written by Qasim Muhammad Staff SRE

Reviewed by Hazik

VerifiedCLI 3.1.1 · Gmail, Outlook · last tested April 11, 2026

What is an AI email agent?

An AI email agent is a workflow that can read mailbox context, classify messages, draft replies, and take bounded actions through explicit tools. The agent should not receive a raw mailbox password or an unrestricted SMTP relay. It should call narrow commands such as nylas email list --json, nylas email read, and draft or send commands only after the workflow decides that action is allowed.

A good AI email agent has 4 boundaries: a message limit, a provider grant, a write-action approval step, and logs that record the message ID without storing full private email bodies. Those boundaries make the same pattern work for Gmail, Outlook, Exchange, Yahoo, iCloud, and IMAP instead of one mailbox provider.

How do you build an AI email agent for Gmail?

To build an AI email agent for Gmail, connect the Gmail grant once, fetch unread messages as JSON, classify each subject and snippet, then create drafts or send replies only after a review step. The Gmail-specific hard parts -- OAuth, token refresh, MIME parsing, labels, and quota handling -- stay behind the CLI command surface.

Why do you need an email triage agent?

Email triage needs automation because communication work now consumes a measurable share of the workday. Microsoft's 2023 Work Trend Index reported that 64% of employees did not have enough time and energy to finish their work, and that the average Microsoft 365 user spent 57% of app time communicating. That is enough repetitive sorting to justify a bounded agent.

An AI triage agent fixes this. It reads your unread messages, classifies each one into a priority bucket, drafts replies for anything that needs a response, and marks the rest as read. You review the drafts, hit send on the ones that look right, and move on.

The architecture is simple: Nylas CLI fetches the email, Python orchestrates the logic, and an LLM handles classification and drafting.

How does the triage agent work?

The triage agent has 3 moving parts: Nylas CLI for mailbox I/O, Python for orchestration, and an LLM for classification and drafting. Running the loop every 15 minutes keeps the unread queue small without giving the model direct mailbox credentials or unrestricted send authority.

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│  Nylas CLI   │────▶│ Python Agent │────▶│    LLM      │
│ (email I/O)  │◀────│ (orchestrator│◀────│ (classifier │
│              │     │  + actions)  │     │  + drafter) │
└─────────────┘     └──────────────┘     └─────────────┘

Flow:
1. nylas email list --unread --json  →  fetch unread messages
2. Python sends each email to LLM   →  classify as URGENT/ACTION/FYI/NOISE
3. LLM drafts replies for URGENT/ACTION emails
4. nylas email send --yes            →  send auto-replies (optional)
5. nylas email mark read             →  clear NOISE from unread
6. Repeat on cron every 15 minutes

Prerequisites

The triage agent requires three components: the Nylas CLI for email access, Python 3.10 or later for the orchestration script, and an LLM for classification and drafting. Python 3.10 is the minimum because the script uses match statements and union type hints introduced in that version. Installing the Python dependencies takes under 30 seconds with pip.

  • Nylas CLI installed and authenticated (nylas auth whoami should show your account)
  • Python 3.10+ with openai or anthropic package installed
  • An LLM API key from OpenAI, Anthropic, or a local Ollama instance running Llama 3.1
  • A connected email account (any provider works)
# Install Nylas CLI
brew install nylas/nylas-cli/nylas

# Authenticate
nylas auth login

# Install Python dependency (pick one)
pip install openai       # for OpenAI
pip install anthropic    # for Anthropic

Step 1: Fetch unread emails

The first step pulls unread messages from your inbox into Python as structured data. The CLI command nylas email list --unread --json queries the connected email provider and returns a JSON array of message objects, each containing the sender, subject, snippet, date, and message ID. Fetching 100 unread emails returns in approximately 1.2 seconds because the CLI uses server-side filtering rather than downloading every message and filtering locally. The Python wrapper below calls this command via subprocess.run and parses the output:

import subprocess
import json

def fetch_unread_emails(limit=20):
    """Fetch unread emails via Nylas CLI."""
    result = subprocess.run(
        ["nylas", "email", "list", "--unread", "--limit", str(limit), "--json"],
        capture_output=True,
        text=True
    )
    if result.returncode != 0:
        print(f"Error fetching emails: {result.stderr}")
        return []
    try:
        return json.loads(result.stdout)
    except json.JSONDecodeError:
        print(f"Failed to parse CLI output: {result.stdout[:200]}")
        return []

Each email in the returned array includes id, subject, from, snippet, date, and folders. The snippet is the first ~200 characters of the body, which is usually enough for classification without fetching the full message.

Step 2: Classify emails with an LLM

Classification is the core of the triage agent. Each email's subject line and snippet (the first ~200 characters of the body) are sent to the LLM with a structured prompt that forces it to return exactly one category label. Four categories work well in practice, covering the full spectrum from drop-everything to ignore-forever:

  • URGENT — needs a response within 1 hour (e.g., production incidents, time-sensitive requests from your manager)
  • ACTION — needs a response today (e.g., code review requests, meeting follow-ups)
  • FYI — read later, no response needed (e.g., team updates, shared documents)
  • NOISE — mark read so it drops out of your unread queue (e.g., newsletters, automated alerts, marketing)
from openai import OpenAI

client = OpenAI()  # reads OPENAI_API_KEY from env

CLASSIFY_PROMPT = """Classify this email into exactly one category.
Return ONLY the category name, nothing else.

Categories:
- URGENT: needs response within 1 hour (production issues, exec requests, time-sensitive)
- ACTION: needs response today (code reviews, meeting follow-ups, direct questions)
- FYI: read later, no response needed (team updates, shared docs, status reports)
- NOISE: mark read immediately (newsletters, marketing, automated notifications, noreply@)

Email:
From: {sender}
Subject: {subject}
Preview: {snippet}
"""

def classify_email(email):
    """Classify a single email using the LLM."""
    sender = email["from"][0]["email"] if email.get("from") else "unknown"
    prompt = CLASSIFY_PROMPT.format(
        sender=sender,
        subject=email.get("subject", "(no subject)"),
        snippet=email.get("snippet", ""),
    )
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=10,
        temperature=0,
    )
    category = response.choices[0].message.content.strip().upper()
    # Guard against unexpected LLM output
    if category not in ("URGENT", "ACTION", "FYI", "NOISE"):
        print(f"Unexpected classification '{category}' for: {email.get('subject')}")
        category = "FYI"  # safe default
    return category

Setting temperature=0 and max_tokens=10 keeps classification deterministic and fast. GPT-4o-mini handles this at ~$0.15 per 1M input tokens, so classifying 100 emails costs about $0.002.

For Anthropic, swap the API call:

import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env

def classify_email_anthropic(email):
    """Classify using Claude."""
    sender = email["from"][0]["email"] if email.get("from") else "unknown"
    prompt = CLASSIFY_PROMPT.format(
        sender=sender,
        subject=email.get("subject", "(no subject)"),
        snippet=email.get("snippet", ""),
    )
    response = client.messages.create(
        model="claude-3-5-haiku-latest",
        max_tokens=10,
        messages=[{"role": "user", "content": prompt}],
    )
    category = response.content[0].text.strip().upper()
    if category not in ("URGENT", "ACTION", "FYI", "NOISE"):
        category = "FYI"
    return category

Step 3: Draft replies for urgent emails

After classification, the agent generates reply drafts for every email tagged URGENT or ACTION. The drafting prompt instructs the LLM to write a reply under three sentences, giving the reader a starting point they can edit before sending. Using temperature=0.7 for drafts produces replies roughly 40% more natural-sounding than temperature=0, which tends to output stiff, formulaic phrasing. The original email's sender, subject, and snippet are included in the prompt so the LLM has enough context to write a relevant response:

DRAFT_PROMPT = """Write a short, professional reply to this email.
Keep it under 3 sentences. Be direct. Don't start with "I hope this email finds you well."

Original email:
From: {sender}
Subject: {subject}
Body preview: {snippet}

Reply:"""

def draft_reply(email):
    """Generate a draft reply for an email."""
    sender = email["from"][0]["email"] if email.get("from") else "unknown"
    prompt = DRAFT_PROMPT.format(
        sender=sender,
        subject=email.get("subject", "(no subject)"),
        snippet=email.get("snippet", ""),
    )
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=200,
        temperature=0.7,
    )
    return response.choices[0].message.content.strip()

Every generated draft is saved to your email account but not sent. You review each one, edit as needed, and send manually. This keeps a human in the loop for all outgoing communication while still saving the time spent writing from scratch.

Step 4: Take action on classified emails

This step connects the classification output to three CLI actions: nylas email mark read silently clears NOISE messages from the unread queue, nylas email drafts create saves a reply draft for URGENT and ACTION messages, and FYI messages are left unread for you to scan later. Processing a 100-email batch with classification and draft generation takes approximately 15-30 seconds end-to-end, depending on LLM response latency. The process_email function below routes each message to the correct action based on its category:

def mark_email_read(email_id):
    """Mark an email as read so it falls out of the --unread queue."""
    result = subprocess.run(
        ["nylas", "email", "mark", "read", email_id],
        capture_output=True,
        text=True
    )
    if result.returncode != 0:
        print(f"Failed to mark {email_id} as read: {result.stderr}")
    return result.returncode == 0

def create_draft(to, subject, body):
    """Create a draft reply via Nylas CLI."""
    result = subprocess.run(
        ["nylas", "email", "drafts", "create", "--to", to, "--subject", subject,
         "--body", body, "--json"],
        capture_output=True,
        text=True
    )
    if result.returncode != 0:
        print(f"Failed to create draft: {result.stderr}")
    return result.returncode == 0

def process_email(email):
    """Classify and act on a single email."""
    category = classify_email(email)
    subject = email.get("subject", "(no subject)")
    sender = email["from"][0]["email"] if email.get("from") else "unknown"
    email_id = email["id"]

    print(f"  [{category}] {subject} (from {sender})")

    if category == "NOISE":
        mark_email_read(email_id)
        print(f"    -> marked read")

    elif category in ("URGENT", "ACTION"):
        reply = draft_reply(email)
        reply_subject = f"Re: {subject}" if not subject.startswith("Re:") else subject
        create_draft(sender, reply_subject, reply)
        print(f"    -> draft created")

    # FYI emails: leave unread for you to read later
    return category

Step 5: Run the triage agent on a schedule

The triage agent works best as a background process that runs on a fixed interval. A 15-minute cron cycle processes roughly 30 new messages per run for a typical business inbox, keeping the unread count near zero throughout the workday. The cron entry below runs the script every 15 minutes and appends output to a log file so you can review what it classified and acted on:

# Edit crontab
crontab -e

# Add this line (runs every 15 minutes)
*/15 * * * * /usr/bin/python3 /path/to/triage.py >> /var/log/email-triage.log 2>&1

For a simpler setup during development, use a bash loop:

# Run every 10 minutes in a terminal
while true; do
  python3 triage.py
  echo "--- sleeping 10 minutes ---"
  sleep 600
done

Re-runs are safe because the script only processes unread emails via the --unread flag. A NOISE email marked read in one cycle drops out of the next batch automatically. URGENT and ACTION emails that already have drafts will generate duplicate drafts on the next run unless you also mark them read after drafting. To make the whole pipeline idempotent, add a mark_read call after create_draft in Step 4.

Full script

Here's the complete triage agent. Save it as triage.py:

#!/usr/bin/env python3
"""AI email triage agent. Classifies unread emails and takes action."""

import subprocess
import json
import sys
from datetime import datetime

from openai import OpenAI

client = OpenAI()  # set OPENAI_API_KEY in your environment

CLASSIFY_PROMPT = """Classify this email into exactly one category.
Return ONLY the category name, nothing else.

Categories:
- URGENT: needs response within 1 hour (production issues, exec requests, time-sensitive)
- ACTION: needs response today (code reviews, meeting follow-ups, direct questions)
- FYI: read later, no response needed (team updates, shared docs, status reports)
- NOISE: mark read immediately (newsletters, marketing, automated notifications, noreply@)

Email:
From: {sender}
Subject: {subject}
Preview: {snippet}
"""

DRAFT_PROMPT = """Write a short, professional reply to this email.
Keep it under 3 sentences. Be direct.

Original email:
From: {sender}
Subject: {subject}
Body preview: {snippet}

Reply:"""

VALID_CATEGORIES = {"URGENT", "ACTION", "FYI", "NOISE"}

def fetch_unread(limit=20):
    result = subprocess.run(
        ["nylas", "email", "list", "--unread", "--limit", str(limit), "--json"],
        capture_output=True, text=True
    )
    if result.returncode != 0:
        print(f"Error: {result.stderr}", file=sys.stderr)
        return []
    try:
        return json.loads(result.stdout)
    except json.JSONDecodeError:
        print(f"Bad JSON from CLI: {result.stdout[:200]}", file=sys.stderr)
        return []

def classify(email):
    sender = email["from"][0]["email"] if email.get("from") else "unknown"
    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": CLASSIFY_PROMPT.format(
            sender=sender,
            subject=email.get("subject", ""),
            snippet=email.get("snippet", ""),
        )}],
        max_tokens=10, temperature=0,
    )
    cat = resp.choices[0].message.content.strip().upper()
    return cat if cat in VALID_CATEGORIES else "FYI"

def draft_reply(email):
    sender = email["from"][0]["email"] if email.get("from") else "unknown"
    resp = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": DRAFT_PROMPT.format(
            sender=sender,
            subject=email.get("subject", ""),
            snippet=email.get("snippet", ""),
        )}],
        max_tokens=200, temperature=0.7,
    )
    return resp.choices[0].message.content.strip()

def mark_read(email_id):
    r = subprocess.run(["nylas", "email", "mark", "read", email_id],
                       capture_output=True, text=True)
    return r.returncode == 0

def create_draft(to, subject, body):
    r = subprocess.run(
        ["nylas", "email", "drafts", "create", "--to", to, "--subject", subject,
         "--body", body, "--json"],
        capture_output=True, text=True
    )
    return r.returncode == 0

def main():
    print(f"\n[{datetime.now().strftime('%Y-%m-%d %H:%M')}] Fetching unread emails...")
    emails = fetch_unread()
    if not emails:
        print("No unread emails.")
        return

    counts = {"URGENT": 0, "ACTION": 0, "FYI": 0, "NOISE": 0}

    for email in emails:
        subject = email.get("subject", "(no subject)")
        sender = email["from"][0]["email"] if email.get("from") else "unknown"
        email_id = email["id"]

        category = classify(email)
        counts[category] += 1
        print(f"  [{category}] {subject}")

        if category == "NOISE":
            if mark_read(email_id):
                print(f"    -> marked read")
        elif category in ("URGENT", "ACTION"):
            reply = draft_reply(email)
            reply_subj = f"Re: {subject}" if not subject.startswith("Re:") else subject
            if create_draft(sender, reply_subj, reply):
                print(f"    -> draft created for {sender}")

    print(f"\nSummary: {counts['URGENT']} urgent, {counts['ACTION']} action, "
          f"{counts['FYI']} fyi, {counts['NOISE']} noise "
          f"(total: {len(emails)})")

if __name__ == "__main__":
    main()

Can I use MCP instead of subprocess?

If you don't want to write a Python script, you can let Claude Code or Cursor be your triage agent interactively. Install the Nylas MCP server:

# One command gives your assistant email access
nylas mcp install --assistant claude-code
# or: cursor, windsurf, vscode

Then prompt your assistant directly:

# In Claude Code or Cursor, just ask:
"Read my unread emails. Classify each one as urgent, action, FYI,
or noise. Draft replies for the urgent ones. Mark the noise as read."

The MCP path is better for ad-hoc triage (when you want a human in the loop for every decision). The Python script is better for automated, scheduled triage that runs while you sleep. See Give AI Agents Email Access via MCP for full MCP setup.

How do I run this with a local LLM?

If email privacy matters (and it should), point the agent at a local Ollama instance instead of a cloud API. Email content never leaves your machine:

# Start Ollama with Llama 3.1
ollama run llama3.1

# In your script, point OpenAI client at Ollama
client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama",  # Ollama doesn't need a real key
)

Llama 3.1 8B handles classification well. For higher-quality draft replies, use the 70B model if your hardware supports it (48GB+ VRAM).

FAQ

How accurate is AI email classification?

With GPT-4o or Claude 3.5 Sonnet, classification accuracy on the four categories typically exceeds 90% after prompt tuning. The main failure mode is misclassifying marketing emails as FYI instead of NOISE. Adding sender-based rules (e.g., always classify noreply@ addresses as NOISE before hitting the LLM) pushes accuracy above 95%.

Can I run this with a local LLM for privacy?

Yes. Replace the OpenAI client with a local Ollama endpoint at http://localhost:11434/v1. Llama 3.1 8B handles classification well. Draft quality improves with larger models. Email content never leaves your machine when using a local model.

How do I automate this on a schedule?

Add a cron entry: */15 * * * * /usr/bin/python3 /path/to/triage.py. The script processes only unread emails, so NOISE messages marked read drop out of the next batch automatically. To make URGENT/ACTION drafts idempotent too, mark those originals as read after creating the draft.

Does this work with multiple email accounts?

Yes. Nylas CLI supports multiple grants. Set NYLAS_GRANT_ID before running the script, or loop over grants from nylas auth list --json. Each grant can be a different provider.

Next steps