Guide

Parse Email Signatures for Contact Enrichment

Email signatures are structured data hiding in plain text. 82% of business emails contain a signature with at least a name and title. This guide shows how to detect signature blocks, extract structured fields with regex, cross-reference multiple messages per sender, and build enriched contact profiles from Gmail, Outlook, Exchange, Yahoo, iCloud, and IMAP inboxes.

By Pouya Sanooei

Anatomy of an email signature

A typical business signature contains 4-8 structured fields packed into 3-6 lines of text. The format varies, but the data follows predictable patterns:

Sarah Chen
VP of Engineering, Acme Corp
+1 (555) 014-2847
sarah@acme.com | linkedin.com/in/sarahchen
acme.com

Each line maps to an extractable field: name (line 1), title + company (line 2), phone (line 3), email + social (line 4), website (line 5). The challenge isn’t pattern matching, it’s isolating the signature from the rest of the email body.

Detect the signature boundary

Signature blocks sit at the bottom of the email, separated by a delimiter. According to RFC 3676, the standard email signature delimiter is -- (two dashes followed by a space). In practice, email clients use many variations. Here are the most common, ranked by frequency:

# Extract signature block by detecting common delimiters
nylas email read <message-id> --json | jq -r '.body' | \
  awk '
    /^-- $|^--$|^_{3,}|^Best regards|^Best,|^Thanks,|^Regards,|^Cheers,|^Sent from/ {
      found=1
    }
    found {print}
  '

The awk script sets a flag when it hits a delimiter, then prints everything after it. For HTML emails, strip tags first with sed 's/<[^>]*>//g' before running the delimiter detection.

Extract structured fields from signatures

Each field type has a distinct regex pattern. Phone numbers follow international formats. LinkedIn URLs have a consistent structure. Job titles cluster around a known vocabulary of ~200 common titles.

#!/usr/bin/env python3
"""Extract structured fields from email signatures."""

import re
from dataclasses import dataclass


@dataclass
class SignatureFields:
    phone: str | None = None
    linkedin: str | None = None
    twitter: str | None = None
    title: str | None = None
    company: str | None = None
    website: str | None = None


# Regex patterns tuned for signature text
PHONE_RE = re.compile(
    r"(?:(?:+d{1,3}[s.-]?)?(?d{2,4})?[s.-]?d{3,4}[s.-]?d{3,4})"
)
LINKEDIN_RE = re.compile(r"linkedin.com/in/[w-]+", re.IGNORECASE)
TWITTER_RE = re.compile(r"(?:twitter.com/|x.com/|@)([A-Za-z0-9_]{1,15})")
WEBSITE_RE = re.compile(r"https?://(?!linkedin|twitter|x.com)[a-zA-Z0-9.-]+.[a-z]{2,}")

# Title keywords ordered by seniority tier
TITLE_TIERS = {
    5: ["CEO", "CTO", "CFO", "COO", "Founder", "Co-Founder", "President"],
    4: ["VP", "SVP", "EVP", "Vice President"],
    3: ["Director", "Head of"],
    2: ["Manager", "Lead", "Principal", "Senior"],
    1: ["Engineer", "Analyst", "Designer", "Consultant", "Associate"],
}

TITLE_RE = re.compile(
    r"(?i)("
    + "|".join(kw for tier in TITLE_TIERS.values() for kw in tier)
    + r")[^\n]{0,60}",
)


def extract_from_signature(text: str) -> SignatureFields:
    """Extract structured fields from signature text."""
    fields = SignatureFields()

    phone_match = PHONE_RE.search(text)
    if phone_match:
        fields.phone = phone_match.group(0).strip()

    linkedin_match = LINKEDIN_RE.search(text)
    if linkedin_match:
        fields.linkedin = linkedin_match.group(0)

    twitter_match = TWITTER_RE.search(text)
    if twitter_match:
        fields.twitter = twitter_match.group(0)

    website_match = WEBSITE_RE.search(text)
    if website_match:
        fields.website = website_match.group(0)

    # Title extraction: find lines containing title keywords
    for line in text.splitlines():
        title_match = TITLE_RE.search(line)
        if title_match:
            fields.title = line.strip()
            # Company often follows title after a comma or "at"
            company_match = re.search(
                r"(?:,s*|s+ats+|s+@s+)(.+?)(?:s*[|]|$)",
                line[title_match.end():]
            )
            if company_match:
                fields.company = company_match.group(1).strip()
            break

    return fields


# Demo
sample = """--
Sarah Chen
VP of Engineering, Acme Corp
+1 (555) 014-2847
sarah@acme.com | linkedin.com/in/sarahchen
acme.com"""

result = extract_from_signature(sample)
print(f"Title: {result.title}")
print(f"Phone: {result.phone}")
print(f"LinkedIn: {result.linkedin}")
print(f"Company: {result.company}")
print(f"Website: {result.website}")

Cross-reference multiple emails per sender

A single email might have a partial signature. Sarah’s mobile reply might only contain “Sent from my iPhone” while her desktop reply has the full signature block. Cross-referencing 3+ messages from the same sender raises field extraction accuracy from 67% to 91%, based on analysis of 50,000 business emails by the Radicati Group in 2023.

#!/usr/bin/env python3
"""Cross-reference multiple emails to build complete contact profiles."""

import json
import subprocess
from collections import defaultdict

def fetch_emails(limit: int = 500) -> list[dict]:
    result = subprocess.run(
        ["nylas", "email", "list", "--json", "--limit", str(limit)],
        capture_output=True, text=True, check=True
    )
    return json.loads(result.stdout)

def read_email(msg_id: str) -> dict:
    result = subprocess.run(
        ["nylas", "email", "read", msg_id, "--json"],
        capture_output=True, text=True, check=True
    )
    return json.loads(result.stdout)

def merge_fields(existing: dict, new_fields: dict) -> dict:
    """Merge new fields into existing record, filling gaps only."""
    merged = dict(existing)
    for key, value in new_fields.items():
        if value and not merged.get(key):
            merged[key] = value
    return merged

# Group emails by sender
emails = fetch_emails()
by_sender: dict[str, list[str]] = defaultdict(list)
for msg in emails:
    sender = msg["from"][0]["email"]
    by_sender[sender].append(msg["id"])

# Read up to 3 emails per sender, extract and merge signatures
profiles = {}
for sender_email, msg_ids in by_sender.items():
    profile = {"email": sender_email, "name": "", "phone": None,
               "linkedin": None, "title": None, "company": None}

    for msg_id in msg_ids[:3]:  # max 3 per sender
        try:
            msg = read_email(msg_id)
            body = msg.get("body", "")
            name = msg.get("from", [{}])[0].get("name", "")
            if name:
                profile["name"] = name
            # ... extract fields from body and merge
        except Exception:
            continue

    profiles[sender_email] = profile

print(f"Built {len(profiles)} enriched profiles from {len(emails)} emails")

Enrich company data from DNS records

The sender’s domain reveals company infrastructure without any API call. MX records show the email provider. SPF records list the marketing and transactional email tools the company uses. DMARC records indicate security maturity.

# MX records reveal email provider
dig +short MX acme.com
# Google Workspace → tech company or startup
# Microsoft 365 → enterprise or traditional industry
# Mimecast/Proofpoint → security-conscious enterprise

# SPF records reveal tools in use
dig +short TXT acme.com | grep "v=spf1"
# include:sendgrid.net → uses SendGrid for transactional email
# include:mailchimp.com → uses Mailchimp for marketing
# include:_spf.salesforce.com → uses Salesforce

# DMARC policy reveals security posture
dig +short TXT _dmarc.acme.com
# p=reject → strict security (enterprise)
# p=none → no enforcement (startup or small org)

Score seniority from title keywords

Job titles extracted from signatures map to seniority tiers. C-suite titles score highest. VP and Director sit in the middle. IC titles (Engineer, Analyst) score lowest. Combined with the email alias pattern (first-name-only addresses like sarah@ trend more senior than sarah.chen@), you get a 1-10 seniority score per contact.

def score_seniority(title: str | None, email: str) -> int:
    """Score 1-10 seniority from title + email alias pattern."""
    title_score = 0
    if title:
        title_lower = title.lower()
        for score, keywords in TITLE_TIERS.items():
            if any(kw.lower() in title_lower for kw in keywords):
                title_score = score * 2  # scale to 0-10
                break

    # Email alias bonus: firstname@ = +2, first.last@ = +1
    local = email.split("@")[0]
    alias_bonus = 2 if "." not in local and len(local) < 15 else 1

    return min(title_score + alias_bonus, 10)

# Examples:
# "VP of Engineering" + sarah@acme.com → 4*2 + 2 = 10
# "Software Engineer" + sarah.chen@acme.com → 1*2 + 1 = 3

Build the enriched contact record

The final output is a JSON record per contact that combines signature data, DNS intelligence, and seniority scoring.

{
  "email": "sarah@acme.com",
  "name": "Sarah Chen",
  "title": "VP of Engineering",
  "company": "Acme Corp",
  "domain": "acme.com",
  "phone": "+1 (555) 014-2847",
  "linkedin": "linkedin.com/in/sarahchen",
  "seniority_score": 10,
  "email_provider": "Google Workspace",
  "tools_detected": ["SendGrid", "Salesforce"],
  "security_posture": "strict",
  "enrichment_sources": {
    "signature_emails_parsed": 3,
    "fields_from_signatures": ["title", "phone", "linkedin"],
    "fields_from_dns": ["email_provider", "tools_detected", "security_posture"]
  }
}

Handle signature edge cases

Not all signatures are clean. These are the most common parsing challenges and how to handle them:

  • HTML signatures with images: Strip tags first. Image-only signatures (logos, social icons as images) contain no text to parse. Skip these.
  • Multilingual signatures: Phone numbers in international formats (+44, +91) work with the regex above. Non-Latin job titles need locale-specific keyword lists.
  • Legal disclaimers: Long confidentiality notices at the end can be mistaken for signatures. Filter lines longer than 200 characters, which are almost always disclaimers, not signature fields.
  • Mobile signatures: “Sent from my iPhone” contains zero enrichment data. Skip these and cross-reference with desktop emails from the same sender.

Next steps