Guide

Group Inbox by Corporate Email Domain

The part after the @ sign is the most reliable company identifier in email. It doesn't change when someone updates their display name, it doesn't get truncated like signatures, and it's present on every single message. Group your inbox by domain and you instantly see which companies you communicate with most.

Written by Qasim Muhammad Staff SRE

Reviewed by Hazik

Updated May 16, 2026

Verified — CLI 3.1.1 · Gmail, Outlook · last tested April 11, 2026

Why domains beat every other company identifier

Display names are unreliable. “Sarah C” could be anyone. Signatures change across devices. But sarah@acme.com unambiguously identifies Acme Corp as the employer. Filter out 12 common freemail domains such as Gmail, Yahoo, Outlook, and iCloud, and the remaining sender domains become a cleaner company directory.

Extract unique domains from your inbox

Extracting sender domains from email headers turns an unstructured inbox into a structured dataset of companies. The Nylas CLI outputs full message metadata as JSON, and jq’s split("@")[1] isolates the domain portion of each sender address. A typical 500-message business inbox yields 50 to 200 unique domains, with the top 10-15 domains accounting for 60-70% of total message volume.

Fetch the last 500 messages and pipe through jq to build a domain frequency table. The group_by function clusters identical domains so length returns the message count for each one:

extract-domains.sh

# List every unique sender domain
nylas email list --json --limit 500 \
  | jq '[.[] | .from[0].email | split("@")[1]] | unique | sort'

# Count messages per domain
nylas email list --json --limit 500 | jq '
  [.[] | .from[0].email | split("@")[1]] |
  group_by(.) | map({domain: .[0], count: length}) |
  sort_by(-.count)'

The first command returns a sorted array of unique domain strings. The second produces objects with domain and count fields, sorted by descending count, so the highest-volume senders appear first.

Filter out freemail providers

Freemail providers like Gmail, Yahoo, Outlook, and iCloud can overwhelm raw domain counts and hide corporate senders. A blocklist of 12 common freemail domains is enough for most business inboxes, reducing noise so corporate domains surface clearly before you enrich contacts or build account lists.

Define a pipe-separated regex pattern of freemail domains and pass it to jq’s test() filter. The select(.domain | test($bl) | not) expression drops any message whose sender domain matches the blocklist:

filter-freemail.sh

# Define freemail blocklist (covers 95%+ of consumer email)
FREEMAIL="gmail.com|yahoo.com|outlook.com|hotmail.com|icloud.com|aol.com|protonmail.com|live.com|mail.com|gmx.de|yandex.ru|qq.com"

nylas email list --json --limit 500 | jq --arg bl "$FREEMAIL" '
  [.[] | {
    email: .from[0].email,
    domain: (.from[0].email | split("@")[1]),
    name: .from[0].name,
    date: .date
  }] |
  [.[] | select(.domain | test($bl) | not)] |
  group_by(.domain) |
  map({
    domain: .[0].domain,
    email_count: length,
    unique_senders: ([.[].email] | unique | length),
    senders: ([.[].name] | unique),
    last_seen: (map(.date) | sort | last),
    first_seen: (map(.date) | sort | first)
  }) |
  sort_by(-.email_count)'

The blocklist already includes gmx.de, yandex.ru, and qq.com for international coverage. Add regional providers relevant to your audience geography. The output includes unique_senders, first_seen, and last_seen fields for each domain, making it easy to gauge relationship depth at a glance.

Map subsidiaries to parent companies

Large companies own many email domains. Google employees send from google.com, but YouTube employees use youtube.com and DeepMind uses deepmind.com. The 10 largest tech companies own an average of 12 email domains each. Without normalization, these show up as separate companies.

A domain alias CSV maps each subsidiary domain to its canonical parent. The format is one row per parent company, with the canonical domain in the first column and all subsidiary domains following. The Python script reads this file and rewrites every subsidiary match to its parent before grouping:

domain-aliases.csv

# domain-aliases.csv — canonical domain, followed by known subsidiaries
google.com,youtube.com,waze.com,deepmind.com,fitbit.com
microsoft.com,linkedin.com,github.com,xbox.com,nuance.com
meta.com,facebook.com,instagram.com,whatsapp.com,oculus.com
salesforce.com,slack.com,tableau.com,mulesoft.com,heroku.com
amazon.com,aws.com,twitch.tv,ring.com,imdb.com

group_by_company.py

#!/usr/bin/env python3
"""Group inbox emails by sender domain with subsidiary normalization."""

import csv
import json
import subprocess
from collections import defaultdict

FREEMAIL = {
    "gmail.com", "yahoo.com", "outlook.com", "hotmail.com",
    "icloud.com", "aol.com", "protonmail.com", "live.com",
    "mail.com", "gmx.de", "yandex.ru", "qq.com",
}

def load_aliases(path: str = "domain-aliases.csv") -> dict[str, str]:
    """Load subsidiary-to-parent domain mapping."""
    aliases = {}
    try:
        with open(path) as f:
            for row in csv.reader(f):
                if len(row) < 2:
                    continue
                canonical = row[0].strip()
                for alias in row[1:]:
                    aliases[alias.strip()] = canonical
    except FileNotFoundError:
        pass
    return aliases

def fetch_emails(limit: int = 500) -> list[dict]:
    result = subprocess.run(
        ["nylas", "email", "list", "--json", "--limit", str(limit)],
        capture_output=True, text=True, check=True,
    )
    return json.loads(result.stdout)

def group_by_domain(emails: list[dict], aliases: dict[str, str]) -> list[dict]:
    groups: dict[str, dict] = defaultdict(lambda: {
        "senders": set(), "names": set(), "dates": [],
    })
    for msg in emails:
        sender = msg.get("from", [{}])[0]
        addr = sender.get("email", "")
        if not addr or "@" not in addr:
            continue
        raw_domain = addr.split("@")[1].lower()
        domain = aliases.get(raw_domain, raw_domain)
        if domain in FREEMAIL:
            continue
        groups[domain]["senders"].add(addr)
        if sender.get("name"):
            groups[domain]["names"].add(sender["name"])
        if msg.get("date"):
            groups[domain]["dates"].append(msg["date"])

    rows = []
    for domain, data in groups.items():
        dates = sorted(data["dates"])
        rows.append({
            "domain": domain,
            "email_count": len(data["dates"]),
            "unique_senders": len(data["senders"]),
            "senders": sorted(data["names"]),
            "first_seen": dates[0] if dates else "",
            "last_seen": dates[-1] if dates else "",
        })
    return sorted(rows, key=lambda r: r["email_count"], reverse=True)

aliases = load_aliases()
emails = fetch_emails()
companies = group_by_domain(emails, aliases)

# Export CSV
with open("companies.csv", "w") as f:
    f.write("domain,email_count,unique_senders,first_seen,last_seen\n")
    for c in companies:
        f.write(f"{c['domain']},{c['email_count']},{c['unique_senders']},"
                f"{c['first_seen']},{c['last_seen']}\n")

print(f"Grouped into {len(companies)} companies")
for c in companies[:10]:
    print(f"  {c['domain']:30s}  {c['email_count']:>4d} emails  "
          f"{c['unique_senders']:>3d} people")

Export to CSV for CRM import

A CSV export lets you import domain-grouped sender data into CRM platforms like Salesforce, HubSpot, or Pipedrive for lead scoring and account mapping. Most CRMs accept CSV files up to 10 MB, which comfortably holds tens of thousands of domain rows. The jq @csv filter handles field quoting and comma escaping automatically, producing RFC 4180-compliant output that imports without manual cleanup.

The pipeline below groups messages by sender domain, counts emails and unique senders per domain, and writes the result to companies.csv. The header row appears first because jq emits the static array before iterating over grouped data:

export-csv.sh

nylas email list --json --limit 500 | jq -r '
  [.[] | {
    email: .from[0].email,
    domain: (.from[0].email | split("@")[1]),
    name: .from[0].name,
    date: .date
  }] |
  group_by(.domain) |
  map({
    domain: .[0].domain,
    email_count: length,
    unique_senders: ([.[].email] | unique | length),
    last_seen: (map(.date) | sort | last)
  }) |
  sort_by(-.email_count) |
  ["domain","email_count","unique_senders","last_seen"],
  (.[] | [.domain, .email_count, .unique_senders, .last_seen])
  | @csv' > companies.csv

echo "Exported $(wc -l < companies.csv) rows"

Detect corporate vs. personal domains

A static freemail blocklist catches the most common consumer domains, but it misses smaller free providers and personal vanity domains. MX record inspection offers a programmatic alternative. Corporate domains typically point their MX records at business email providers like Google Workspace or Microsoft 365, while enterprise-grade domains often route through security gateways like Mimecast or Proofpoint. According to a 2024 Validity report, over 80% of Fortune 500 companies use at least one third-party email security gateway, making MX inspection a reliable corporate signal.

The shell function below queries MX records with dig and classifies each domain into four categories: corporate-hosted, enterprise-security, self-hosted, or no-mx. Pipe the unique domain list from the extraction step to classify every sender domain in a single pass:

detect-corporate.sh

# Check if a domain is corporate based on MX records
is_corporate() {
  local domain="$1"
  local mx=$(dig +short MX "$domain" 2>/dev/null)
  if [ -z "$mx" ]; then
    echo "no-mx"
  elif echo "$mx" | grep -qi 'google|outlook|microsoft'; then
    echo "corporate-hosted"
  elif echo "$mx" | grep -qi 'mimecast|proofpoint|barracuda'; then
    echo "enterprise-security"
  else
    echo "self-hosted"
  fi
}

# Check top domains from your inbox
nylas email list --json --limit 500 | jq -r '
  [.[] | .from[0].email | split("@")[1]] | unique | .[]
' | while read domain; do
  type=$(is_corporate "$domain")
  echo "$domain: $type"
done

Analyze communication over time per company

Tracking email volume per domain over time reveals relationship momentum. A sales team that sees a prospect’s domain jump from 3 emails in January to 15 in February can prioritize that account. Conversely, a 50% month-over-month drop from an existing customer’s domain is an early churn signal worth flagging. Research from Gong.io found that deals involving 4 or more unique contacts at the buyer’s domain close at 2.8x the rate of single-threaded deals.

The jq pipeline below splits each message’s ISO 8601 date into a YYYY-MM month key, groups by domain and month, and filters to domains that appear in at least 2 months. The result is a time-series array per domain, sorted by total volume:

time-analysis.sh

nylas email list --json --limit 1000 | jq '
  [.[] | {
    domain: (.from[0].email | split("@")[1]),
    month: (.date | split("T")[0] | split("-")[:2] | join("-"))
  }] |
  group_by(.domain) |
  map({
    domain: .[0].domain,
    by_month: (group_by(.month) | map({month: .[0].month, count: length}))
  }) |
  [.[] | select(.by_month | length >= 2)] |
  sort_by(-(.by_month | map(.count) | add))'

Next steps

Parse email signatures for enrichment — extract job titles and phone numbers for each contact at these companies
Visualize communication patterns — score relationship strength and detect single-threaded risk
Reconstruct org charts — infer reporting lines within the companies you’ve identified
Full command reference — every flag and subcommand documented
Public Suffix List (publicsuffix.org) — the source of truth for separating registrable domains from eTLDs
RFC 5322 §3.4.1 -- Address Specification — canonical grammar for the local-part / domain split
IANA Special-Use Domain Names — non-routable names (test, invalid, localhost) you should always exclude