Guide

Parse and Download Email Attachments

An invoice lands in your inbox every Monday. You want a script that pulls the PDF and saves it to a folder. In Python, that's 40 lines of MIME parsing. In Node.js, it's a two-step Gmail API dance with base64url decoding. With the CLI, it's one command. This guide compares all three approaches.

Written by Hazik Director of Product Management

VerifiedCLI 3.1.11 · Gmail, Outlook · last tested May 21, 2026

Command references used in this guide: nylas email attachments list to inspect attachment metadata, nylas email list and nylas email search to find messages, and nylas email read to view full message content.

How do the 3 attachment methods compare?

Parsing email attachments means navigating MIME structures, decoding binary payloads, and handling provider-specific edge cases. Python's imaplib requires roughly 40 lines for a working extractor. Node.js with the Gmail API cuts that to about 25 lines but adds a second HTTP call per file. A CLI approach skips MIME entirely and downloads files in 2 commands. The table below breaks down setup time, complexity, and provider coverage for each method.

MethodSetup timeLines of codeProvidersHandles inline?MIME parsing?
Python imaplib~10 min~40Any IMAP serverYes (manual)Yes
Node.js Gmail API~15 min~25Gmail onlyYesNo (API abstracts it)
CLI~2 min2 commands6 (Gmail, Outlook, Exchange, Yahoo, iCloud, IMAP)YesNo

The right choice depends on your constraints. A Python data-pipeline script that processes one provider's inbox can use imaplib directly. A Gmail-specific integration that already has OAuth credentials can lean on the Gmail API. A multi-provider script, CI job, or AI agent that needs attachments from any mailbox saves the most time with the CLI.

How do I parse attachments with Python imaplib?

Python's imaplib connects to an IMAP server and fetches raw email bytes. The email module then parses those bytes into a MIME tree. Attachments hide inside multipart/mixed containers, sometimes nested 3 levels deep inside multipart/alternative wrappers. According to RFC 2045, MIME allows arbitrary nesting, so you can't assume a flat structure. The code below walks the full tree in about 40 lines.

Python's email module was redesigned in version 3.6 with the new email.policy API, but most tutorials still use the legacy parser. The example below uses the modern email.policy.default policy, which handles character encoding and header decoding automatically. The get_payload(decode=True) call strips the base64 or quoted-printable transfer encoding and returns raw bytes, saving you from calling base64.b64decode() yourself.

import imaplib
import email
from email.policy import default as default_policy
from pathlib import Path

# Connect to Gmail IMAP (port 993, SSL)
imap = imaplib.IMAP4_SSL("imap.gmail.com")
imap.login("you@gmail.com", "your-app-password")
imap.select("INBOX")

# Search for messages with attachments
status, msg_ids = imap.search(None, "ALL")
latest_id = msg_ids[0].split()[-1]  # grab the newest message

# Fetch the full RFC 822 message
status, data = imap.fetch(latest_id, "(RFC822)")
raw_email = data[0][1]
msg = email.message_from_bytes(raw_email, policy=default_policy)

# Walk the MIME tree and save attachments
output_dir = Path("./downloads")
output_dir.mkdir(exist_ok=True)
saved = 0

for part in msg.walk():
    disposition = part.get_content_disposition()
    if disposition != "attachment":
        continue

    filename = part.get_filename()
    if not filename:
        continue

    payload = part.get_payload(decode=True)
    if payload is None:
        continue

    dest = output_dir / filename
    dest.write_bytes(payload)
    saved += 1
    print(f"Saved {filename} ({len(payload):,} bytes)")

imap.logout()
print(f"Done: {saved} attachment(s) saved to {output_dir}")

The msg.walk() generator yields every MIME part in depth-first order. Checking get_content_disposition() filters to parts explicitly marked as attachments. This misses inline images (Content-Disposition: inline), which we'll cover in a later section. One gotcha: filenames can contain RFC 2231 encoded characters like =?UTF-8?B?...?=. The modern email.policy decodes these automatically, but the legacy API doesn't.

How do I download attachments with the Node.js Gmail API?

The Gmail API exposes attachments through a two-step process: first fetch the message to get attachmentId values from payload.parts, then call messages.attachments.get for each file's binary content. Gmail returns attachment bodies as base64url-encoded strings (RFC 4648 Section 5), not standard base64. Node.js has no built-in base64url decoder until the Buffer.from(data, "base64url") method added in Node 15.7.0 (January 2021). About 25 lines cover the full flow.

The example below uses Google's official googleapis npm package. You'll need an OAuth 2.0 credential file from the Google Cloud Console with the gmail.readonly scope. According to Google's attachment size documentation, files under 25 MB are returned inline. Files over 25 MB are stored as Google Drive links and require the Drive API to download.

const { google } = require("googleapis");
const fs = require("fs");
const path = require("path");

// Assumes you have an authenticated OAuth2 client
const gmail = google.gmail({ version: "v1", auth: oAuth2Client });

async function downloadAttachments(messageId) {
  // Step 1: Get the message with full payload
  const { data: message } = await gmail.users.messages.get({
    userId: "me",
    id: messageId,
    format: "full",
  });

  const parts = message.payload.parts || [];
  const outputDir = "./downloads";
  fs.mkdirSync(outputDir, { recursive: true });

  for (const part of parts) {
    if (!part.body.attachmentId) continue;

    // Step 2: Fetch binary content per attachment
    const { data: attachment } = await gmail.users.messages.attachments.get({
      userId: "me",
      messageId,
      id: part.body.attachmentId,
    });

    // Gmail uses base64url encoding, not standard base64
    const buffer = Buffer.from(attachment.data, "base64url");
    const filePath = path.join(outputDir, part.filename);
    fs.writeFileSync(filePath, buffer);
    console.log(`Saved ${part.filename} (${buffer.length} bytes)`);
  }
}

// Usage: pass a message ID from gmail.users.messages.list
downloadAttachments("18f1a2b3c4d5e6f7");

Two things to watch. First, payload.parts can be nested. A message with both HTML body and attachments wraps them in multipart/mixed containing multipart/alternative (for text/html body) alongside the attachment parts. You may need to recurse into nested parts arrays. Second, the attachmentId is tied to the message in its current folder. If a user moves the message between folders (Inbox to Archive), the ID stays valid, but Google's docs note that draft message IDs change when a draft is sent.

How do I download attachments with the CLI?

Nylas CLI abstracts MIME parsing, base64 decoding, and provider differences behind two commands. The nylas email attachments list command shows every attachment on a message with its ID, filename, MIME type, and size. The nylas email attachments download command fetches the binary content and writes it to disk. No IMAP connection string, no OAuth credential files, no base64url decoding. The CLI works across all 6 supported providers without changing any flags.

The workflow below starts by finding a message with attachments, listing its files, then downloading one. Each step takes about 2 seconds because the CLI communicates with the Nylas API over HTTPS rather than maintaining a persistent IMAP session. For headless environments like CI/CD runners, authenticate once with nylas auth config --api-key and every subsequent command uses the stored credential.

# Step 1: Find messages that have attachments
nylas email search "*" --has-attachment --json --limit 5

# Step 2: List attachments on a specific message
nylas email attachments list msg_abc123

# Example output:
# ID              FILENAME         TYPE              SIZE
# att_x1y2z3      invoice-may.pdf  application/pdf   284 KB
# att_a4b5c6      logo.png         image/png         12 KB

# Step 3: Download a specific attachment
nylas email attachments download att_x1y2z3 msg_abc123 --output ./downloads/

# Step 4: Batch download — list + download all in a loop
nylas email attachments list msg_abc123 --json \
  | jq -r '.[].id' \
  | while read att_id; do
      nylas email attachments download "$att_id" msg_abc123 \
        --output ./downloads/
    done

The --json flag on attachments list returns structured output that pipes cleanly into jq. Each object includes id, filename, content_type, and size. The --output flag on download accepts a directory path and preserves the original filename. If you omit --output, the file saves to the current directory with the original filename. For automation scripts, combine the search-and-download steps into a single shell script that runs on a cron schedule.

What provider-specific gotchas should I know?

Each email provider implements attachments differently at the protocol and API level, creating edge cases that break generic extraction code. Gmail, Outlook, Yahoo, and IMAP servers each have at least one behavior that surprises developers on first encounter. Knowing these before you write extraction logic saves hours of debugging. The 4 most common gotchas affect file size limits, attachment IDs, protocol support, and rate limiting.

Gmail: Attachments under 25 MB are embedded in the MIME body and accessible through the API's attachmentId field. Files over 25 MB are automatically converted to Google Drive links. According to Google's attachment size documentation, the 25 MB limit applies to the pre-encoding size. After base64 encoding (which adds ~33% overhead), a 25 MB file becomes roughly 33 MB on the wire. The Gmail API returns a 404 error if you request an attachmentId on a Drive-linked file.

Outlook / Microsoft Graph: The Graph API uses a $value endpoint (/messages/{id}/attachments/{att-id}/$value) to return raw binary content. According to Microsoft's attachment documentation, the maximum attachment size for Exchange Online is 150 MB as of 2024. One notable behavior: attachment IDs in Graph are stable across moves within the same mailbox, but forwarding a message to another user generates new IDs.

IMAP (generic): The BODYSTRUCTURE command lets you discover attachment metadata without downloading the full message. RFC 3501 Section 6.4.5 defines the response format, but not all servers implement it identically. Some servers omit the Content-Disposition parameter, so you fall back to checking Content-Type with a name parameter instead. Yahoo Mail's IMAP server returns BODYSTRUCTURE correctly but imposes aggressive rate limiting on FETCH commands for large attachments. Expect throttling after about 50 MB of downloads per session.

What is the difference between inline and regular attachments?

Email MIME defines two Content-Disposition values for binary parts: attachment and inline. According to RFC 2183, a part marked attachment should be displayed separately from the message body, while inline means the part should be rendered in context. In practice, inline parts are almost always images embedded in HTML email bodies using CID (Content-ID) references. About 78% of marketing emails contain at least one inline image, according to a 2023 Litmus analysis of 5 billion emails.

The distinction matters for extraction scripts. If you only check for Content-Disposition: attachment, you'll miss inline images. If you download everything with a Content-Type starting with image/, you'll pull tracking pixels and logo footers along with real attachments. The Python imaplib approach gives you full control through get_content_disposition(). The Gmail API includes inline parts in payload.parts with body.attachmentId set. The CLI's attachments list shows both types and includes a content_disposition field in JSON output so you can filter with jq.

# List all attachments including inline images
nylas email attachments list msg_abc123 --json

# Filter to only regular (non-inline) attachments
nylas email attachments list msg_abc123 --json \
  | jq '[.[] | select(.content_disposition == "attachment")]'

# Filter to only inline images (CID-referenced)
nylas email attachments list msg_abc123 --json \
  | jq '[.[] | select(.content_disposition == "inline")]'

One edge case: some email clients set Content-Disposition: inline on PDF and Word documents, not just images. Outlook's mobile client is a known offender. If your script needs to catch every user-attached file regardless of disposition, check for a filename field on the part instead of relying solely on the disposition header.

How do I script a complete attachment extraction workflow?

A production extraction script combines search, filtering, and download into a single loop. The workflow below finds the 10 most recent messages with attachments, filters to PDF files only, downloads each one, and logs the results. It runs in under 30 seconds for 10 messages because the CLI handles pagination and connection management internally. This pattern works identically on Gmail, Outlook, Exchange, Yahoo, iCloud, and IMAP accounts.

The script uses nylas email search with the --has-attachment flag to narrow results to messages that actually contain files, avoiding wasted API calls on plain-text emails. The --json output pipes into jq to extract message IDs, then each message's attachments are listed and filtered by MIME type before downloading. For large-scale extraction (100+ messages), add a 200ms sleep between downloads to stay within API rate limits of 5 requests per second.

#!/bin/bash
# Download all PDF attachments from recent emails
set -euo pipefail

OUTPUT_DIR="./extracted-pdfs"
mkdir -p "$OUTPUT_DIR"
count=0

# Find messages with attachments
msg_ids=$(nylas email search "*" --has-attachment --json --limit 10 \
  | jq -r '.[].id')

for msg_id in $msg_ids; do
  # List attachments, filter to PDFs
  pdf_atts=$(nylas email attachments list "$msg_id" --json \
    | jq -r '.[] | select(.content_type == "application/pdf") | .id')

  for att_id in $pdf_atts; do
    nylas email attachments download "$att_id" "$msg_id" \
      --output "$OUTPUT_DIR/"
    count=$((count + 1))
  done
done

echo "Downloaded $count PDF attachment(s) to $OUTPUT_DIR"

Adapt this script by changing the jq filter. Replace "application/pdf" with "text/csv" for spreadsheets, "image/png" for screenshots, or remove the select() entirely to download all file types. For a Python wrapper that adds error handling and retry logic, see the Python email without SMTP guide for the subprocess pattern.

Next steps