Guide

Organize Emails by Company and Domain

Every business email address contains a company domain. Group your inbox by domain and you instantly know which companies you communicate with most, how many people you know at each, and when you last interacted — without touching a CRM.

Why domains are the key to CRM organization

Every business email reveals the sender's company. The part after the @ sign is a company identifier that is more reliable than display names, signatures, or subject lines. Someone at jane@acme.com works for Acme — no parsing required.

Free email addresses (Gmail, Yahoo, Outlook.com) indicate personal contacts. Corporate domains map directly to accounts and companies in CRM systems like Salesforce, HubSpot, or Pipedrive. If you group your inbox by domain, you get a de-facto company directory without any manual data entry.

The Nylas CLI's --json output gives you structured sender data — name, email, timestamps — that you can pipe through standard Unix tools. The rest of this guide shows you how.

Extract unique domains from your inbox

Start by pulling a list of every unique sender domain from your recent email:

nylas email list --json --limit 500 \
  | jq '[.[] | .from[0].email | split("@")[1]] | unique | sort'

Here is what each step does:

  • .from[0].email — extracts the primary sender's email address from each message
  • split("@")[1] — takes everything after the @ sign, giving you the domain
  • unique | sort — deduplicates and alphabetizes the list

This gives you a quick snapshot: every company that has emailed you in the last 500 messages. On a typical business inbox, expect 50-200 unique domains.

Group emails by domain

A flat list of domains is useful, but you want counts, sender names, and date ranges. This jq pipeline groups messages by domain and aggregates the metadata:

nylas email list --json --limit 500 | jq '
  [.[] | {
    email: .from[0].email,
    domain: (.from[0].email | split("@")[1]),
    name: .from[0].name,
    date: .date
  }]
  | group_by(.domain)
  | map({
      domain: .[0].domain,
      email_count: length,
      unique_senders: ([.[].email] | unique | length),
      senders: ([.[].name] | unique),
      last_seen: (map(.date) | sort | last),
      first_seen: (map(.date) | sort | first)
    })
  | sort_by(-.email_count)'

The output is an array of objects sorted by email count descending. Each entry tells you:

  • domain — the company domain
  • email_count — how many messages you have exchanged with that domain
  • unique_senders — how many distinct people at that company have emailed you
  • senders — their display names
  • last_seen / first_seen — the date range of your correspondence

Filter out freemail domains

Consumer email domains like Gmail and Yahoo will dominate the list if you leave them in. Define a blocklist and filter them out:

FREEMAIL="gmail.com|yahoo.com|outlook.com|hotmail.com|icloud.com|aol.com|protonmail.com|live.com|mail.com"

nylas email list --json --limit 500 | jq --arg freemail "$FREEMAIL" '
  [.[] | {
    email: .from[0].email,
    domain: (.from[0].email | split("@")[1])
  }]
  | group_by(.domain)
  | map({domain: .[0].domain, count: length})
  | [.[] | select(.domain | test($freemail) | not)]
  | sort_by(-.count)'

The --arg freemail flag passes the regex into jq as a variable. The select(... | not) filter drops any domain that matches. What remains is your business contact landscape.

You can extend the blocklist with regional providers like gmx.de, yandex.ru, or qq.com depending on your audience.

Normalize company domains

Large companies own many domains. Google employees email from google.com, but YouTube and DeepMind employees use their own domains. If you want a single "Google" row, you need a domain alias map.

Create a CSV file where the first column is the canonical domain and the rest are aliases:

# domain-aliases.csv
google.com,youtube.com,waze.com,deepmind.com
microsoft.com,linkedin.com,github.com,xbox.com
meta.com,facebook.com,instagram.com,whatsapp.com
salesforce.com,slack.com,tableau.com,mulesoft.com
amazon.com,aws.com,twitch.tv,whole-foods.com

Then use an awk pre-processing step to normalize domains before grouping:

# Build a lookup table from the alias file, then normalize domains
nylas email list --json --limit 500 \
  | jq -r '.[] | .from[0].email | split("@")[1]' \
  | awk -F, '
    NR == FNR {
      # Read alias file: map each alias to its canonical domain
      canonical = $1
      for (i = 2; i <= NF; i++) aliases[$i] = canonical
      next
    }
    {
      domain = $0
      if (domain in aliases) domain = aliases[domain]
      count[domain]++
    }
    END {
      for (d in count) printf "%d\t%s\n", count[d], d
    }
  ' domain-aliases.csv - \
  | sort -rn

This gives you a tab-separated list of canonical domains and their combined counts. The awk script reads the alias file first (via NR == FNR), builds a lookup table, then normalizes each domain from stdin before counting.

Export to CSV

Pipe the grouped jq output into a CSV file that you can open in Excel, Google Sheets, or import into your CRM:

nylas email list --json --limit 500 | jq -r '
  [.[] | {
    email: .from[0].email,
    domain: (.from[0].email | split("@")[1]),
    name: .from[0].name,
    date: .date
  }]
  | group_by(.domain)
  | map({
      domain: .[0].domain,
      email_count: length,
      unique_senders: ([.[].email] | unique | length),
      last_seen: (map(.date) | sort | last)
    })
  | sort_by(-.email_count)
  | ["domain","email_count","unique_senders","last_seen"],
    (.[] | [.domain, .email_count, .unique_senders, .last_seen])
  | @csv' > companies.csv

echo "Wrote $(wc -l < companies.csv) rows to companies.csv"

The @csv jq filter handles quoting and escaping automatically. The first array becomes the header row.

Python version

If you prefer Python for more complex transformations — or want to integrate with pandas, a database, or an API — here is a complete script:

#!/usr/bin/env python3
"""Group inbox emails by sender domain and export to CSV."""

import csv
import json
import subprocess
import sys
from collections import defaultdict
from datetime import datetime

FREEMAIL = {
    "gmail.com", "yahoo.com", "outlook.com", "hotmail.com",
    "icloud.com", "aol.com", "protonmail.com", "live.com", "mail.com",
}

DOMAIN_ALIASES = {
    "youtube.com": "google.com", "waze.com": "google.com",
    "deepmind.com": "google.com", "linkedin.com": "microsoft.com",
    "github.com": "microsoft.com", "facebook.com": "meta.com",
    "instagram.com": "meta.com", "whatsapp.com": "meta.com",
    "slack.com": "salesforce.com", "tableau.com": "salesforce.com",
}


def fetch_emails(limit: int = 500) -> list[dict]:
    result = subprocess.run(
        ["nylas", "email", "list", "--json", "--limit", str(limit)],
        capture_output=True, text=True, check=True,
    )
    return json.loads(result.stdout)


def normalize_domain(domain: str) -> str:
    return DOMAIN_ALIASES.get(domain, domain)


def group_by_domain(emails: list[dict]) -> list[dict]:
    groups: dict[str, dict] = defaultdict(lambda: {
        "emails": set(), "names": set(), "dates": [],
    })

    for msg in emails:
        sender = msg.get("from", [{}])[0]
        email_addr = sender.get("email", "")
        if not email_addr or "@" not in email_addr:
            continue

        domain = normalize_domain(email_addr.split("@")[1].lower())
        if domain in FREEMAIL:
            continue

        groups[domain]["emails"].add(email_addr)
        if sender.get("name"):
            groups[domain]["names"].add(sender["name"])
        if msg.get("date"):
            groups[domain]["dates"].append(msg["date"])

    rows = []
    for domain, data in groups.items():
        dates = sorted(data["dates"])
        rows.append({
            "domain": domain,
            "email_count": len(data["dates"]),
            "unique_senders": len(data["emails"]),
            "senders": "; ".join(sorted(data["names"])),
            "first_seen": dates[0] if dates else "",
            "last_seen": dates[-1] if dates else "",
        })

    return sorted(rows, key=lambda r: r["email_count"], reverse=True)


def export_csv(rows: list[dict], path: str = "companies.csv") -> None:
    fields = ["domain", "email_count", "unique_senders", "senders", "first_seen", "last_seen"]
    with open(path, "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=fields)
        writer.writeheader()
        writer.writerows(rows)
    print(f"Wrote {len(rows)} companies to {path}")


if __name__ == "__main__":
    limit = int(sys.argv[1]) if len(sys.argv) > 1 else 500
    emails = fetch_emails(limit)
    rows = group_by_domain(emails)
    export_csv(rows)

    # Print top 10 to terminal
    print("\nTop 10 companies by email volume:")
    for row in rows[:10]:
        print(f"  {row['domain']:30s}  {row['email_count']:>4d} emails  {row['unique_senders']:>3d} people")

Run it with python3 organize_by_company.py or pass a custom limit: python3 organize_by_company.py 1000.

TypeScript version

For TypeScript/Node environments — useful if you are building this into a larger workflow or API:

import { execFileSync } from "child_process";
import { writeFileSync } from "fs";

interface DomainGroup {
  domain: string;
  emailCount: number;
  uniqueSenders: number;
  senders: string[];
  firstSeen: string;
  lastSeen: string;
}

interface EmailMessage {
  from: { email: string; name: string }[];
  date: string;
}

const FREEMAIL = new Set([
  "gmail.com", "yahoo.com", "outlook.com", "hotmail.com",
  "icloud.com", "aol.com", "protonmail.com", "live.com", "mail.com",
]);

const DOMAIN_ALIASES: Record<string, string> = {
  "youtube.com": "google.com", "waze.com": "google.com",
  "deepmind.com": "google.com", "linkedin.com": "microsoft.com",
  "github.com": "microsoft.com", "facebook.com": "meta.com",
  "instagram.com": "meta.com", "whatsapp.com": "meta.com",
  "slack.com": "salesforce.com", "tableau.com": "salesforce.com",
};

function normalizeDomain(domain: string): string {
  return DOMAIN_ALIASES[domain] ?? domain;
}

function fetchEmails(limit = 500): EmailMessage[] {
  const output = execFileSync("nylas", ["email", "list", "--json", "--limit", String(limit)], {
    encoding: "utf-8",
    maxBuffer: 50 * 1024 * 1024,
  });
  return JSON.parse(output);
}

function groupByDomain(emails: EmailMessage[]): DomainGroup[] {
  const groups = new Map<string, { emails: Set<string>; names: Set<string>; dates: string[] }>();

  for (const msg of emails) {
    const sender = msg.from?.[0];
    if (!sender?.email || !sender.email.includes("@")) continue;

    const rawDomain = sender.email.split("@")[1].toLowerCase();
    const domain = normalizeDomain(rawDomain);
    if (FREEMAIL.has(domain)) continue;

    if (!groups.has(domain)) {
      groups.set(domain, { emails: new Set(), names: new Set(), dates: [] });
    }
    const group = groups.get(domain)!;
    group.emails.add(sender.email);
    if (sender.name) group.names.add(sender.name);
    if (msg.date) group.dates.push(msg.date);
  }

  const rows: DomainGroup[] = [];
  for (const [domain, data] of groups) {
    const dates = [...data.dates].sort();
    rows.push({
      domain,
      emailCount: data.dates.length,
      uniqueSenders: data.emails.size,
      senders: [...data.names].sort(),
      firstSeen: dates[0] ?? "",
      lastSeen: dates.at(-1) ?? "",
    });
  }

  return rows.sort((a, b) => b.emailCount - a.emailCount);
}

function exportCsv(rows: DomainGroup[], path = "companies.csv"): void {
  const header = "domain,email_count,unique_senders,first_seen,last_seen";
  const lines = rows.map((r) =>
    [r.domain, r.emailCount, r.uniqueSenders, r.firstSeen, r.lastSeen]
      .map((v) => `"${String(v).replace(/"/g, '""')}"`)
      .join(",")
  );
  writeFileSync(path, [header, ...lines].join("\n") + "\n");
  console.log(`Wrote ${rows.length} companies to ${path}`);
}

// Main
const emails = fetchEmails(Number(process.argv[2]) || 500);
const companies = groupByDomain(emails);
exportCsv(companies);

console.log("\nTop 10 companies by email volume:");
for (const c of companies.slice(0, 10)) {
  console.log(`  ${c.domain.padEnd(30)}  ${String(c.emailCount).padStart(4)} emails  ${String(c.uniqueSenders).padStart(3)} people`);
}

Run with npx tsx organize_by_company.ts or compile and run with tsc && node organize_by_company.js.

Next steps