Guide
Build a LlamaIndex Email Agent (CLI Tool)
LlamaIndex started as a data framework for RAG and grew into a full agent runtime with function tools and workflows. Giving one of its agents email usually means a provider SDK and OAuth. The lighter path is to wrap the Nylas CLI as a FunctionTool: each call is one subprocess returning JSON, and the same tool reaches Gmail, Outlook, and four more providers. This guide builds a LlamaIndex agent around that tool and keeps sends behind a human.
Written by Pouya Sanooei Software Engineer
Command references used in this guide: nylas email list, nylas email search, and nylas email drafts create.
How do you give a LlamaIndex agent email?
You give a LlamaIndex agent email by wrapping a CLI call in a FunctionTool and passing it to a FunctionAgent. A FunctionTool is built from a typed Python function; LlamaIndex reads the signature and docstring to expose it to the model. Inside, you run a CLI command and return its stdout — and because nylas email list --json emits structured data, the agent gets clean JSON to reason over.
Authenticate the CLI once with nylas auth login; the stored grant is reused on every subprocess call, so the tool never touches credentials. This mirrors how the CLI plugs into other Python agent frameworks — the subprocess boundary keeps provider details out of agent code. LlamaIndex's tool and agent APIs are documented in the LlamaIndex docs.
Why wrap the CLI instead of a provider SDK?
Wrapping the CLI turns six integrations into one. A direct build would mean the Gmail API, Microsoft Graph, and IMAP, each with its own OAuth app and pagination. The CLI gives one command surface across all six providers and refreshes OAuth tokens — which expire every 3,600 seconds on most providers — with no token code in your agent.
It also keeps the tool small and inspectable, which matters when you later add retrieval. LlamaIndex is built for RAG, so a common pattern is to pull messages with nylas email search --json, index them, and let the agent answer questions over the result. The thin wrapper returns JSON you can validate before it ever reaches an index or the model.
import subprocess
from llama_index.core.tools import FunctionTool
def list_unread(limit: int = 10) -> str:
"""List unread emails as JSON across the connected mailbox."""
out = subprocess.run(
["nylas", "email", "list", "--unread", "--json", "--limit", str(limit)],
capture_output=True, text=True, check=True,
)
return out.stdout
def search_email(query: str) -> str:
"""Search the mailbox with a provider-agnostic query, return JSON."""
out = subprocess.run(
["nylas", "email", "search", query, "--json", "--limit", "20"],
capture_output=True, text=True, check=True,
)
return out.stdout
list_tool = FunctionTool.from_defaults(fn=list_unread)
search_tool = FunctionTool.from_defaults(fn=search_email)How do you build the agent?
Build the agent by passing the tools and an LLM to a FunctionAgent, then run it on a request. The agent decides which tool to call, runs the subprocess, reads the JSON, and loops until it can answer. A triage request like “summarize my unread mail and flag anything urgent” resolves in a few tool calls, with the model reading structured output at each step.
Keep the toolset narrow — one function per action. Two read tools plus a draft tool are enough for most inbox agents. LlamaIndex handles the reasoning loop; your job is to give it clean, single-purpose tools and clear docstrings so it calls the right one.
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.anthropic import Anthropic
agent = FunctionAgent(
tools=[list_tool, search_tool],
llm=Anthropic(model="claude-sonnet-4-6"),
system_prompt="You triage email. Read first, never send without approval.",
)
response = await agent.run("Summarize my unread mail and flag anything urgent.")
print(response)How do you keep sends safe?
Keep outbound actions behind a human. Rather than a send tool, give the agent a draft tool that runs nylas email drafts create, which composes a message without sending it and returns a draft ID. A person reviews and sends, so a misclassification can't reach a customer. This one rule prevents the most damaging failure mode of an autonomous email agent.
The reason is that the agent reads untrusted content, and a prompt injection in an email body can try to steer it. Guardrails that sit outside the model's reasoning — a review checkpoint, or connector-level rules — hold even when injected text tries to talk past them. For deterministic enforcement at the connector, see stopping a rogue agent at the connector layer.
Next steps
- RAG over email — index and retrieve mail for grounded answers
- LangGraph email agent — the same pattern in a stateful graph
- Human-in-the-loop email agent — draft-and-approve guardrails
- Build an AI email triage agent — classification and routing end to end
- Pydantic AI email agent — typed tools wrapping the CLI
- Evaluate an email AI agent — test sets, metrics, guardrails
- Full command reference — every flag and subcommand documented