Source: https://cli.nylas.com/guides/parse-email-attachments

# Parse and Download Email Attachments

An invoice lands in your inbox every Monday. You want a script that pulls the PDF and saves it to a folder. In Python, that's 40 lines of MIME parsing. In Node.js, it's a two-step Gmail API dance with base64url decoding. With the CLI, it's one command. This guide compares all three approaches.

Written by [Hazik](https://cli.nylas.com/authors/hazik) Director of Product Management

Updated May 21, 2026

> **TL;DR:** Python's `imaplib` + `email` module needs ~40 lines to walk a MIME tree and decode base64 payloads. Node.js with the Gmail API takes ~25 lines but requires a second API call per attachment. The CLI does it in 2 commands: `nylas email attachments list` then `nylas email attachments download`, across 6 providers with zero MIME parsing.

> **Disclosure:** Nylas CLI is built by Nylas, Inc. This comparison reflects our testing and product understanding as of May 21, 2026.

Command references used in this guide: [`nylas email attachments list`](https://cli.nylas.com/docs/commands/email-attachments-list) to inspect attachment metadata, [`nylas email list`](https://cli.nylas.com/docs/commands/email-list) and [`nylas email search`](https://cli.nylas.com/docs/commands/email-search) to find messages, and [`nylas email read`](https://cli.nylas.com/docs/commands/email-read) to view full message content.

## How do the 3 attachment methods compare?

Parsing email attachments means navigating MIME structures, decoding binary payloads, and handling provider-specific edge cases. Python's `imaplib` requires roughly 40 lines for a working extractor. Node.js with the Gmail API cuts that to about 25 lines but adds a second HTTP call per file. A CLI approach skips MIME entirely and downloads files in 2 commands. The table below breaks down setup time, complexity, and provider coverage for each method.

| Method | Setup time | Lines of code | Providers | Handles inline? | MIME parsing? |
| --- | --- | --- | --- | --- | --- |
| **Python imaplib** | ~10 min | ~40 | Any IMAP server | Yes (manual) | Yes |
| **Node.js Gmail API** | ~15 min | ~25 | Gmail only | Yes | No (API abstracts it) |
| **CLI** | ~2 min | 2 commands | 6 (Gmail, Outlook, Exchange, Yahoo, iCloud, IMAP) | Yes | No |

The right choice depends on your constraints. A Python data-pipeline script that processes one provider's inbox can use `imaplib` directly. A Gmail-specific integration that already has OAuth credentials can lean on the Gmail API. A multi-provider script, CI job, or AI agent that needs attachments from any mailbox saves the most time with the CLI.

## How do I parse attachments with Python imaplib?

Python's [`imaplib`](https://docs.python.org/3/library/imaplib.html) connects to an IMAP server and fetches raw email bytes. The [`email`](https://docs.python.org/3/library/email.html) module then parses those bytes into a MIME tree. Attachments hide inside `multipart/mixed` containers, sometimes nested 3 levels deep inside `multipart/alternative` wrappers. According to [RFC 2045](https://datatracker.ietf.org/doc/html/rfc2045), MIME allows arbitrary nesting, so you can't assume a flat structure. The code below walks the full tree in about 40 lines.

Python's `email` module was redesigned in version 3.6 with the new `email.policy` API, but most tutorials still use the legacy parser. The example below uses the modern `email.policy.default` policy, which handles character encoding and header decoding automatically. The `get_payload(decode=True)` call strips the base64 or quoted-printable transfer encoding and returns raw bytes, saving you from calling `base64.b64decode()` yourself.

```python
import imaplib
import email
from email.policy import default as default_policy
from pathlib import Path

# Connect to Gmail IMAP (port 993, SSL)
imap = imaplib.IMAP4_SSL("imap.gmail.com")
imap.login("you@gmail.com", "your-app-password")
imap.select("INBOX")

# Search for messages with attachments
status, msg_ids = imap.search(None, "ALL")
latest_id = msg_ids[0].split()[-1]  # grab the newest message

# Fetch the full RFC 822 message
status, data = imap.fetch(latest_id, "(RFC822)")
raw_email = data[0][1]
msg = email.message_from_bytes(raw_email, policy=default_policy)

# Walk the MIME tree and save attachments
output_dir = Path("./downloads")
output_dir.mkdir(exist_ok=True)
saved = 0

for part in msg.walk():
    disposition = part.get_content_disposition()
    if disposition != "attachment":
        continue

    filename = part.get_filename()
    if not filename:
        continue

    payload = part.get_payload(decode=True)
    if payload is None:
        continue

    dest = output_dir / filename
    dest.write_bytes(payload)
    saved += 1
    print(f"Saved {filename} ({len(payload):,} bytes)")

imap.logout()
print(f"Done: {saved} attachment(s) saved to {output_dir}")
```

The `msg.walk()` generator yields every MIME part in depth-first order. Checking `get_content_disposition()` filters to parts explicitly marked as attachments. This misses inline images (Content-Disposition: inline), which we'll cover in a later section. One gotcha: filenames can contain [RFC 2231](https://datatracker.ietf.org/doc/html/rfc2231) encoded characters like `=?UTF-8?B?...?=`. The modern `email.policy` decodes these automatically, but the legacy API doesn't.

## How do I download attachments with the Node.js Gmail API?

The [Gmail API](https://developers.google.com/gmail/api/reference/rest/v1/users.messages.attachments/get) exposes attachments through a two-step process: first fetch the message to get `attachmentId` values from `payload.parts`, then call `messages.attachments.get` for each file's binary content. Gmail returns attachment bodies as base64url-encoded strings (RFC 4648 Section 5), not standard base64. Node.js has no built-in base64url decoder until the `Buffer.from(data, "base64url")` method added in Node 15.7.0 (January 2021). About 25 lines cover the full flow.

The example below uses Google's official `googleapis` npm package. You'll need an OAuth 2.0 credential file from the [Google Cloud Console](https://console.cloud.google.com/apis/credentials) with the `gmail.readonly` scope. According to Google's [attachment size documentation](https://developers.google.com/gmail/api/guides/uploads), files under 25 MB are returned inline. Files over 25 MB are stored as Google Drive links and require the Drive API to download.

```javascript
const { google } = require("googleapis");
const fs = require("fs");
const path = require("path");

// Assumes you have an authenticated OAuth2 client
const gmail = google.gmail({ version: "v1", auth: oAuth2Client });

async function downloadAttachments(messageId) {
  // Step 1: Get the message with full payload
  const { data: message } = await gmail.users.messages.get({
    userId: "me",
    id: messageId,
    format: "full",
  });

  const parts = message.payload.parts || [];
  const outputDir = "./downloads";
  fs.mkdirSync(outputDir, { recursive: true });

  for (const part of parts) {
    if (!part.body.attachmentId) continue;

    // Step 2: Fetch binary content per attachment
    const { data: attachment } = await gmail.users.messages.attachments.get({
      userId: "me",
      messageId,
      id: part.body.attachmentId,
    });

    // Gmail uses base64url encoding, not standard base64
    const buffer = Buffer.from(attachment.data, "base64url");
    const filePath = path.join(outputDir, part.filename);
    fs.writeFileSync(filePath, buffer);
    console.log(`Saved ${part.filename} (${buffer.length} bytes)`);
  }
}

// Usage: pass a message ID from gmail.users.messages.list
downloadAttachments("18f1a2b3c4d5e6f7");
```

Two things to watch. First, `payload.parts` can be nested. A message with both HTML body and attachments wraps them in `multipart/mixed` containing `multipart/alternative` (for text/html body) alongside the attachment parts. You may need to recurse into nested `parts` arrays. Second, the `attachmentId` is tied to the message in its current folder. If a user moves the message between folders (Inbox to Archive), the ID stays valid, but Google's docs note that draft message IDs change when a draft is sent.

## How do I download attachments with the CLI?

Nylas CLI abstracts MIME parsing, base64 decoding, and provider differences behind two commands. The `nylas email attachments list` command shows every attachment on a message with its ID, filename, MIME type, and size. The `nylas email attachments download` command fetches the binary content and writes it to disk. No IMAP connection string, no OAuth credential files, no base64url decoding. The CLI works across all 6 supported providers without changing any flags.

The workflow below starts by finding a message with attachments, listing its files, then downloading one. Each step takes about 2 seconds because the CLI communicates with the Nylas API over HTTPS rather than maintaining a persistent IMAP session. For headless environments like CI/CD runners, authenticate once with [`nylas auth config --api-key`](https://cli.nylas.com/docs/commands/auth-config) and every subsequent command uses the stored credential.

```bash
# Step 1: Find messages that have attachments
nylas email search "*" --has-attachment --json --limit 5

# Step 2: List attachments on a specific message
nylas email attachments list msg_abc123

# Example output:
# ID              FILENAME         TYPE              SIZE
# att_x1y2z3      invoice-may.pdf  application/pdf   284 KB
# att_a4b5c6      logo.png         image/png         12 KB

# Step 3: Download a specific attachment
nylas email attachments download att_x1y2z3 msg_abc123 --output ./downloads/

# Step 4: Batch download — list + download all in a loop
nylas email attachments list msg_abc123 --json \
  | jq -r '.[].id' \
  | while read att_id; do
      nylas email attachments download "$att_id" msg_abc123 \
        --output ./downloads/
    done
```

The `--json` flag on `attachments list` returns structured output that pipes cleanly into `jq`. Each object includes `id`, `filename`, `content_type`, and `size`. The `--output` flag on `download` accepts a directory path and preserves the original filename. If you omit `--output`, the file saves to the current directory with the original filename. For automation scripts, combine the search-and-download steps into a single shell script that runs on a cron schedule.

## What provider-specific gotchas should I know?

Each email provider implements attachments differently at the protocol and API level, creating edge cases that break generic extraction code. Gmail, Outlook, Yahoo, and IMAP servers each have at least one behavior that surprises developers on first encounter. Knowing these before you write extraction logic saves hours of debugging. The 4 most common gotchas affect file size limits, attachment IDs, protocol support, and rate limiting.

**Gmail:** Attachments under 25 MB are embedded in the MIME body and accessible through the API's `attachmentId` field. Files over 25 MB are automatically converted to Google Drive links. According to [Google's attachment size documentation](https://support.google.com/mail/answer/6584), the 25 MB limit applies to the pre-encoding size. After base64 encoding (which adds ~33% overhead), a 25 MB file becomes roughly 33 MB on the wire. The Gmail API returns a `404` error if you request an `attachmentId` on a Drive-linked file.

**Outlook / Microsoft Graph:** The Graph API uses a `$value` endpoint (`/messages/{id}/attachments/{att-id}/$value`) to return raw binary content. According to [Microsoft's attachment documentation](https://learn.microsoft.com/en-us/graph/api/attachment-get), the maximum attachment size for Exchange Online is 150 MB as of 2024. One notable behavior: attachment IDs in Graph are stable across moves within the same mailbox, but forwarding a message to another user generates new IDs.

**IMAP (generic):** The `BODYSTRUCTURE` command lets you discover attachment metadata without downloading the full message. [RFC 3501 Section 6.4.5](https://datatracker.ietf.org/doc/html/rfc3501#section-6.4.5) defines the response format, but not all servers implement it identically. Some servers omit the `Content-Disposition` parameter, so you fall back to checking `Content-Type` with a `name` parameter instead. Yahoo Mail's IMAP server returns `BODYSTRUCTURE` correctly but imposes aggressive rate limiting on `FETCH` commands for large attachments. Expect throttling after about 50 MB of downloads per session.

## What is the difference between inline and regular attachments?

Email MIME defines two Content-Disposition values for binary parts: `attachment` and `inline`. According to [RFC 2183](https://datatracker.ietf.org/doc/html/rfc2183), a part marked `attachment` should be displayed separately from the message body, while `inline` means the part should be rendered in context. In practice, inline parts are almost always images embedded in HTML email bodies using CID (Content-ID) references. About 78% of marketing emails contain at least one inline image, according to a 2023 Litmus analysis of 5 billion emails.

The distinction matters for extraction scripts. If you only check for `Content-Disposition: attachment`, you'll miss inline images. If you download everything with a `Content-Type` starting with `image/`, you'll pull tracking pixels and logo footers along with real attachments. The Python `imaplib` approach gives you full control through `get_content_disposition()`. The Gmail API includes inline parts in `payload.parts` with `body.attachmentId` set. The CLI's `attachments list` shows both types and includes a `content_disposition` field in JSON output so you can filter with `jq`.

```bash
# List all attachments including inline images
nylas email attachments list msg_abc123 --json

# Filter to only regular (non-inline) attachments
nylas email attachments list msg_abc123 --json \
  | jq '[.[] | select(.content_disposition == "attachment")]'

# Filter to only inline images (CID-referenced)
nylas email attachments list msg_abc123 --json \
  | jq '[.[] | select(.content_disposition == "inline")]'
```

One edge case: some email clients set `Content-Disposition: inline` on PDF and Word documents, not just images. Outlook's mobile client is a known offender. If your script needs to catch every user-attached file regardless of disposition, check for a `filename` field on the part instead of relying solely on the disposition header.

## How do I script a complete attachment extraction workflow?

A production extraction script combines search, filtering, and download into a single loop. The workflow below finds the 10 most recent messages with attachments, filters to PDF files only, downloads each one, and logs the results. It runs in under 30 seconds for 10 messages because the CLI handles pagination and connection management internally. This pattern works identically on Gmail, Outlook, Exchange, Yahoo, iCloud, and IMAP accounts.

The script uses `nylas email search` with the `--has-attachment` flag to narrow results to messages that actually contain files, avoiding wasted API calls on plain-text emails. The `--json` output pipes into `jq` to extract message IDs, then each message's attachments are listed and filtered by MIME type before downloading. For large-scale extraction (100+ messages), add a 200ms sleep between downloads to stay within API rate limits of 5 requests per second.

```bash
#!/bin/bash
# Download all PDF attachments from recent emails
set -euo pipefail

OUTPUT_DIR="./extracted-pdfs"
mkdir -p "$OUTPUT_DIR"
count=0

# Find messages with attachments
msg_ids=$(nylas email search "*" --has-attachment --json --limit 10 \
  | jq -r '.[].id')

for msg_id in $msg_ids; do
  # List attachments, filter to PDFs
  pdf_atts=$(nylas email attachments list "$msg_id" --json \
    | jq -r '.[] | select(.content_type == "application/pdf") | .id')

  for att_id in $pdf_atts; do
    nylas email attachments download "$att_id" "$msg_id" \
      --output "$OUTPUT_DIR/"
    count=$((count + 1))
  done
done

echo "Downloaded $count PDF attachment(s) to $OUTPUT_DIR"
```

Adapt this script by changing the `jq` filter. Replace `"application/pdf"` with `"text/csv"` for spreadsheets, `"image/png"` for screenshots, or remove the `select()` entirely to download all file types. For a Python wrapper that adds error handling and retry logic, see the [Python email without SMTP guide](https://cli.nylas.com/guides/send-email-python-without-smtp) for the subprocess pattern.

## Next steps

- [Send email with attachments from CLI](https://cli.nylas.com/guides/send-email-with-attachments-cli) -- the reverse workflow: attach files and send from the terminal
- [Download email attachments in PowerShell](https://cli.nylas.com/guides/download-attachments-powershell) -- Windows-specific patterns with Defender scanning and SharePoint auto-save
- [Send email from Python: SMTP, API, and CLI](https://cli.nylas.com/guides/send-email-python) -- the companion comparison guide for sending rather than receiving
- [Parse inbound email webhooks](https://cli.nylas.com/guides/parse-inbound-email-webhooks) -- process attachments in real time as they arrive
- [Getting started with Nylas CLI](https://cli.nylas.com/guides/getting-started) -- install and authenticate in under 2 minutes
- [nylas email attachments list command reference](https://cli.nylas.com/docs/commands/email-attachments-list) -- every flag for listing attachment metadata
- [Full command reference](https://cli.nylas.com/docs/commands) -- every flag and subcommand documented
