Guide

Build a Contact Hierarchy from Email

CC patterns, meeting invites, and shared domains reveal who works with whom and who reports to whom. Extract org structure from your inbox without manually entering a single record into a CRM.

Extract all contacts

Your address book and your inbox contain overlapping but different sets of people. Contacts give you structured names, companies, and phone numbers. Email headers give you every person who has ever sent to or been CC’d on a message — including people you never explicitly added as contacts. The first step is to extract both and merge them into a single list.

# Get structured contacts
nylas contacts list --json --limit 200 > contacts.json

# Get unique senders/recipients from email
nylas email list --json --limit 500 | jq '[
  .[] |
  (.from[0] // empty),
  (.to[]? // empty),
  (.cc[]? // empty)
] | unique_by(.email) | sort_by(.email)' > email_contacts.json

Now merge the two sources. Contacts from your address book are the primary record because they have richer fields (company, phone, job title). Email-derived contacts fill the gaps — people you communicate with but never added to your contacts.

# Merge: contacts.json is primary, email_contacts.json fills gaps
jq -s '
  (.[0] | map({email: .emails[0].email, name: (.given_name + " " + .surname), source: "contacts"}) // []) as $contacts |
  (.[1] | map(. + {source: "email"}) // []) as $email |
  ($contacts + $email)
  | group_by(.email)
  | map(.[0] * (.[1] // {}))
  | sort_by(.email)
' contacts.json email_contacts.json > all_contacts.json

The result is a deduplicated list where every person has at least a name and email address. People who appear in both sources get the richer contact record, with email-derived fields filling in anything missing.

Pair people with companies

Every business email address contains the company’s domain after the @. Group contacts by domain and you immediately see which companies you know people at, how many contacts you have at each, and who those people are. Filter out freemail providers like gmail.com and outlook.com to focus on business domains.

# Merge contacts with their company domains
cat all_contacts.json | jq '
  [.[] | . + {domain: (.email | split("@")[1])}]
  | group_by(.domain)
  | map({
      company_domain: .[0].domain,
      people: [.[] | {name: .name, email: .email}]
    })
  | [.[] | select(.company_domain | IN("gmail.com","yahoo.com","outlook.com","hotmail.com","icloud.com") | not)]
  | sort_by(-(.people | length))'

This gives you a company-centric view of your contacts. Each entry shows the domain and the list of people you know there. The freemail filter removes personal addresses so you see only business relationships. Sort by headcount to put the companies where you have the deepest relationships at the top.

Infer roles from CC patterns

CC behavior is one of the strongest signals for hierarchy in email. Three patterns are particularly reliable:

  • Manager signal: If person A always CCs person B on emails to external recipients, B is likely A’s manager or a stakeholder who needs visibility.
  • Stakeholder signal: If person B is CC’d on emails from multiple different senders in the same domain, B is likely a manager or executive who oversees that group.
  • Volume asymmetry: People who receive more CCs than they send trend more senior — they are kept informed rather than doing the informing.

Extract CC pairs from your email and count how frequently each person is CC’d:

nylas email list --json --limit 500 | jq '
  [.[] | select(.cc != null and (.cc | length) > 0) |
    {sender: .from[0].email, cc_list: [.cc[].email]}
  ] | [.[] | .cc_list[] as $cc | {sender: .sender, cc: $cc}]
  | group_by(.cc)
  | map({cc_recipient: .[0].cc, cc_count: length, cc_from: ([.[].sender] | unique)})
  | sort_by(-.cc_count)'

The output ranks people by how often they are CC’d. Someone who receives CCs from five different senders at the same company is almost certainly a manager or director. Someone who receives CCs only from one person is likely that person’s direct manager. Cross-reference the cc_from array with the company domain grouping from the previous step to see these patterns within each organization.

Infer hierarchy from calendar

Calendar data adds a second dimension to hierarchy inference. Meeting organizers, attendee lists, and recurrence patterns each carry distinct signals about seniority and reporting relationships.

nylas calendar events list --json --limit 100 | jq '
  [.[] | {
    organizer: .organizer.email,
    attendees: [.participants[].email],
    title: .title,
    recurring: (.recurrence != null)
  }]'

Three patterns to look for in the output:

  • Meeting organizers trend senior. The person who creates the meeting typically sets the agenda. If one person organizes most meetings with external contacts, they are likely the account owner or team lead.
  • Recurring one-on-ones signal manager-report pairs. A weekly or biweekly 1:1 between two people at the same company is one of the strongest signals that one reports to the other. The organizer is usually the manager.
  • Skip-level meetings. If an attendee who has a recurring 1:1 with person A also has a separate recurring meeting with person A’s organizer, that suggests a skip-level — the attendee reports to A, who reports to the organizer.

Combine calendar signals with CC patterns for higher confidence. Someone who both receives frequent CCs and organizes meetings with external participants is very likely in a leadership role.

Build the hierarchy tree

With CC frequency and calendar organizer data scored, you can build a per-company hierarchy tree. The structure below groups people by company domain and assigns a likely_senior or team_member classification based on the signals collected in the previous steps.

{
  "acme.com": {
    "likely_senior": ["vp@acme.com", "director@acme.com"],
    "team_members": ["engineer@acme.com", "analyst@acme.com"],
    "signals": {
      "vp@acme.com": {"cc_received": 12, "meetings_organized": 8},
      "director@acme.com": {"cc_received": 9, "meetings_organized": 5},
      "engineer@acme.com": {"cc_received": 1, "meetings_organized": 0},
      "analyst@acme.com": {"cc_received": 0, "meetings_organized": 1}
    }
  }
}

The threshold for “likely senior” depends on your data volume. A reasonable starting point: anyone with a CC count above the 75th percentile for their company domain, or who organizes more than 3 meetings with external participants, gets classified as senior. Tune these thresholds after spot-checking against companies where you already know the org chart.

Python version

The Python script below wraps the full pipeline — extract contacts and emails, analyze CC patterns, pull calendar data, and output a structured hierarchy as JSON. It uses dataclasses for clean types and subprocess to call the CLI.

#!/usr/bin/env python3
"""Build a contact hierarchy from email and calendar data."""

import json
import subprocess
from collections import defaultdict
from dataclasses import dataclass, field, asdict


@dataclass
class Person:
    name: str
    email: str
    domain: str
    cc_received: int = 0
    meetings_organized: int = 0
    inferred_role: str = "team_member"


@dataclass
class Company:
    domain: str
    people: list[Person] = field(default_factory=list)
    likely_senior: list[str] = field(default_factory=list)
    team_members: list[str] = field(default_factory=list)


def run_cli(args: list[str]) -> list[dict]:
    """Run a CLI command and return parsed JSON output."""
    result = subprocess.run(args, capture_output=True, text=True, check=True)
    return json.loads(result.stdout)


def extract_contacts() -> dict[str, Person]:
    contacts = run_cli(["nylas", "contacts", "list", "--json", "--limit", "200"])
    emails = run_cli(["nylas", "email", "list", "--json", "--limit", "500"])

    people: dict[str, Person] = {}
    for c in contacts:
        addr = c.get("emails", [{}])[0].get("email", "")
        if addr:
            name = f"{c.get('given_name', '')} {c.get('surname', '')}".strip()
            people[addr] = Person(name=name, email=addr, domain=addr.split("@")[1])

    for msg in emails:
        for field_name in ("from", "to", "cc"):
            for entry in msg.get(field_name, []) or []:
                addr = entry.get("email", "")
                if addr and addr not in people:
                    domain = addr.split("@")[1]
                    people[addr] = Person(
                        name=entry.get("name", ""), email=addr, domain=domain
                    )
    return people


def analyze_cc_patterns(people: dict[str, Person]) -> None:
    emails = run_cli(["nylas", "email", "list", "--json", "--limit", "500"])
    for msg in emails:
        for cc_entry in msg.get("cc", []) or []:
            addr = cc_entry.get("email", "")
            if addr in people:
                people[addr].cc_received += 1


def analyze_calendar(people: dict[str, Person]) -> None:
    events = run_cli(["nylas", "calendar", "events", "list", "--json", "--limit", "100"])
    for event in events:
        organizer = (event.get("organizer") or {}).get("email", "")
        if organizer in people:
            people[organizer].meetings_organized += 1


def build_hierarchy(people: dict[str, Person]) -> dict[str, Company]:
    by_domain: dict[str, list[Person]] = defaultdict(list)
    freemail = {"gmail.com", "yahoo.com", "outlook.com", "hotmail.com", "icloud.com"}

    for p in people.values():
        if p.domain not in freemail:
            by_domain[p.domain].append(p)

    companies: dict[str, Company] = {}
    for domain, members in by_domain.items():
        cc_values = [m.cc_received for m in members]
        threshold = sorted(cc_values)[int(len(cc_values) * 0.75)] if cc_values else 0

        company = Company(domain=domain, people=members)
        for m in members:
            if m.cc_received > threshold or m.meetings_organized >= 3:
                m.inferred_role = "likely_senior"
                company.likely_senior.append(m.email)
            else:
                company.team_members.append(m.email)
        companies[domain] = company

    return companies


def main() -> None:
    people = extract_contacts()
    analyze_cc_patterns(people)
    analyze_calendar(people)
    hierarchy = build_hierarchy(people)
    print(json.dumps({d: asdict(c) for d, c in hierarchy.items()}, indent=2))


if __name__ == "__main__":
    main()

TypeScript version

The TypeScript version uses typed interfaces for the hierarchy tree and calls the CLI via a child process. The same logic applies — extract, analyze CC patterns, analyze calendar, classify.

#!/usr/bin/env npx ts-node
/**
 * Build a contact hierarchy from email and calendar data.
 */

import { execFileSync } from "child_process";

interface HierarchySignals {
  cc_received: number;
  meetings_organized: number;
}

interface PersonNode {
  name: string;
  email: string;
  domain: string;
  inferred_role: "likely_senior" | "team_member";
  signals: HierarchySignals;
}

interface CompanyHierarchy {
  domain: string;
  likely_senior: string[];
  team_members: string[];
  people: PersonNode[];
}

const FREEMAIL = new Set(["gmail.com", "yahoo.com", "outlook.com", "hotmail.com", "icloud.com"]);

function runCli(args: string[]): unknown[] {
  const output = execFileSync("nylas", args, { encoding: "utf-8", maxBuffer: 10 * 1024 * 1024 });
  return JSON.parse(output);
}

function extractContacts(): Map<string, PersonNode> {
  const contacts = runCli(["contacts", "list", "--json", "--limit", "200"]) as Record<string, unknown>[];
  const emails = runCli(["email", "list", "--json", "--limit", "500"]) as Record<string, unknown>[];
  const people = new Map<string, PersonNode>();

  for (const c of contacts) {
    const addr = ((c.emails as { email: string }[]) ?? [])[0]?.email;
    if (addr) {
      people.set(addr, {
        name: `${c.given_name ?? ""} ${c.surname ?? ""}`.trim(),
        email: addr,
        domain: addr.split("@")[1],
        inferred_role: "team_member",
        signals: { cc_received: 0, meetings_organized: 0 },
      });
    }
  }

  for (const msg of emails) {
    for (const field of ["from", "to", "cc"] as const) {
      for (const entry of (msg[field] as { email: string; name?: string }[]) ?? []) {
        if (entry.email && !people.has(entry.email)) {
          people.set(entry.email, {
            name: entry.name ?? "",
            email: entry.email,
            domain: entry.email.split("@")[1],
            inferred_role: "team_member",
            signals: { cc_received: 0, meetings_organized: 0 },
          });
        }
      }
    }
  }
  return people;
}

function analyzeCcPatterns(people: Map<string, PersonNode>): void {
  const emails = runCli(["email", "list", "--json", "--limit", "500"]) as Record<string, unknown>[];
  for (const msg of emails) {
    for (const cc of (msg.cc as { email: string }[]) ?? []) {
      const person = people.get(cc.email);
      if (person) person.signals.cc_received++;
    }
  }
}

function analyzeCalendar(people: Map<string, PersonNode>): void {
  const events = runCli(["calendar", "events", "list", "--json", "--limit", "100"]) as Record<string, unknown>[];
  for (const event of events) {
    const organizer = (event.organizer as { email: string })?.email;
    const person = organizer ? people.get(organizer) : undefined;
    if (person) person.signals.meetings_organized++;
  }
}

function buildHierarchy(people: Map<string, PersonNode>): Record<string, CompanyHierarchy> {
  const byDomain = new Map<string, PersonNode[]>();
  for (const p of people.values()) {
    if (!FREEMAIL.has(p.domain)) {
      const list = byDomain.get(p.domain) ?? [];
      list.push(p);
      byDomain.set(p.domain, list);
    }
  }

  const result: Record<string, CompanyHierarchy> = {};
  for (const [domain, members] of byDomain) {
    const ccValues = members.map((m) => m.signals.cc_received).sort((a, b) => a - b);
    const threshold = ccValues[Math.floor(ccValues.length * 0.75)] ?? 0;

    const company: CompanyHierarchy = { domain, likely_senior: [], team_members: [], people: members };
    for (const m of members) {
      if (m.signals.cc_received > threshold || m.signals.meetings_organized >= 3) {
        m.inferred_role = "likely_senior";
        company.likely_senior.push(m.email);
      } else {
        company.team_members.push(m.email);
      }
    }
    result[domain] = company;
  }
  return result;
}

const people = extractContacts();
analyzeCcPatterns(people);
analyzeCalendar(people);
console.log(JSON.stringify(buildHierarchy(people), null, 2));

Validate and iterate

Hierarchy inference from email is probabilistic, not deterministic. Start by spot-checking the output against companies where you already know the org chart. If you know that Alice is the VP of Engineering at Acme and your tool classifies her as team_member, your CC threshold is too high or you need more email data from that company.

Tuning tips:

  • Lower the CC threshold if too many known seniors are being classified as team members. A threshold of 3 or higher CC receipts works well for active email accounts.
  • Increase the meeting organizer weight if your calendar data is richer than your email data. Some organizations have strict email cultures but schedule everything via calendar.
  • Add more email history by increasing the --limit flag. Five hundred emails is a reasonable starting point, but a thousand gives significantly better signal for larger organizations.
  • Combine with signature parsing from Guide 3: Enrich Contacts from Email to validate inferred roles against actual job titles extracted from email signatures.

Next steps