Guide
Parse Email Signatures for Contact Enrichment
Email signatures are structured data hiding in plain text. 82% of business emails contain a signature with at least a name and title. This guide shows how to detect signature blocks, extract structured fields with regex, cross-reference multiple messages per sender, and build enriched contact profiles from Gmail, Outlook, Exchange, Yahoo, iCloud, and IMAP inboxes.
By Pouya Sanooei
Anatomy of an email signature
A typical business signature contains 4-8 structured fields packed into 3-6 lines of text. The format varies, but the data follows predictable patterns:
Sarah Chen
VP of Engineering, Acme Corp
+1 (555) 014-2847
sarah@acme.com | linkedin.com/in/sarahchen
acme.comEach line maps to an extractable field: name (line 1), title + company (line 2), phone (line 3), email + social (line 4), website (line 5). The challenge isn’t pattern matching, it’s isolating the signature from the rest of the email body.
Detect the signature boundary
Signature blocks sit at the bottom of the email, separated by a delimiter. According to RFC 3676, the standard email signature delimiter is -- (two dashes followed by a space). In practice, email clients use many variations. Here are the most common, ranked by frequency:
# Extract signature block by detecting common delimiters
nylas email read <message-id> --json | jq -r '.body' | \
awk '
/^-- $|^--$|^_{3,}|^Best regards|^Best,|^Thanks,|^Regards,|^Cheers,|^Sent from/ {
found=1
}
found {print}
'The awk script sets a flag when it hits a delimiter, then prints everything after it. For HTML emails, strip tags first with sed 's/<[^>]*>//g' before running the delimiter detection.
Extract structured fields from signatures
Each field type has a distinct regex pattern. Phone numbers follow international formats. LinkedIn URLs have a consistent structure. Job titles cluster around a known vocabulary of ~200 common titles.
#!/usr/bin/env python3
"""Extract structured fields from email signatures."""
import re
from dataclasses import dataclass
@dataclass
class SignatureFields:
phone: str | None = None
linkedin: str | None = None
twitter: str | None = None
title: str | None = None
company: str | None = None
website: str | None = None
# Regex patterns tuned for signature text
PHONE_RE = re.compile(
r"(?:(?:+d{1,3}[s.-]?)?(?d{2,4})?[s.-]?d{3,4}[s.-]?d{3,4})"
)
LINKEDIN_RE = re.compile(r"linkedin.com/in/[w-]+", re.IGNORECASE)
TWITTER_RE = re.compile(r"(?:twitter.com/|x.com/|@)([A-Za-z0-9_]{1,15})")
WEBSITE_RE = re.compile(r"https?://(?!linkedin|twitter|x.com)[a-zA-Z0-9.-]+.[a-z]{2,}")
# Title keywords ordered by seniority tier
TITLE_TIERS = {
5: ["CEO", "CTO", "CFO", "COO", "Founder", "Co-Founder", "President"],
4: ["VP", "SVP", "EVP", "Vice President"],
3: ["Director", "Head of"],
2: ["Manager", "Lead", "Principal", "Senior"],
1: ["Engineer", "Analyst", "Designer", "Consultant", "Associate"],
}
TITLE_RE = re.compile(
r"(?i)("
+ "|".join(kw for tier in TITLE_TIERS.values() for kw in tier)
+ r")[^\n]{0,60}",
)
def extract_from_signature(text: str) -> SignatureFields:
"""Extract structured fields from signature text."""
fields = SignatureFields()
phone_match = PHONE_RE.search(text)
if phone_match:
fields.phone = phone_match.group(0).strip()
linkedin_match = LINKEDIN_RE.search(text)
if linkedin_match:
fields.linkedin = linkedin_match.group(0)
twitter_match = TWITTER_RE.search(text)
if twitter_match:
fields.twitter = twitter_match.group(0)
website_match = WEBSITE_RE.search(text)
if website_match:
fields.website = website_match.group(0)
# Title extraction: find lines containing title keywords
for line in text.splitlines():
title_match = TITLE_RE.search(line)
if title_match:
fields.title = line.strip()
# Company often follows title after a comma or "at"
company_match = re.search(
r"(?:,s*|s+ats+|s+@s+)(.+?)(?:s*[|]|$)",
line[title_match.end():]
)
if company_match:
fields.company = company_match.group(1).strip()
break
return fields
# Demo
sample = """--
Sarah Chen
VP of Engineering, Acme Corp
+1 (555) 014-2847
sarah@acme.com | linkedin.com/in/sarahchen
acme.com"""
result = extract_from_signature(sample)
print(f"Title: {result.title}")
print(f"Phone: {result.phone}")
print(f"LinkedIn: {result.linkedin}")
print(f"Company: {result.company}")
print(f"Website: {result.website}")Cross-reference multiple emails per sender
A single email might have a partial signature. Sarah’s mobile reply might only contain “Sent from my iPhone” while her desktop reply has the full signature block. Cross-referencing 3+ messages from the same sender raises field extraction accuracy from 67% to 91%, based on analysis of 50,000 business emails by the Radicati Group in 2023.
#!/usr/bin/env python3
"""Cross-reference multiple emails to build complete contact profiles."""
import json
import subprocess
from collections import defaultdict
def fetch_emails(limit: int = 500) -> list[dict]:
result = subprocess.run(
["nylas", "email", "list", "--json", "--limit", str(limit)],
capture_output=True, text=True, check=True
)
return json.loads(result.stdout)
def read_email(msg_id: str) -> dict:
result = subprocess.run(
["nylas", "email", "read", msg_id, "--json"],
capture_output=True, text=True, check=True
)
return json.loads(result.stdout)
def merge_fields(existing: dict, new_fields: dict) -> dict:
"""Merge new fields into existing record, filling gaps only."""
merged = dict(existing)
for key, value in new_fields.items():
if value and not merged.get(key):
merged[key] = value
return merged
# Group emails by sender
emails = fetch_emails()
by_sender: dict[str, list[str]] = defaultdict(list)
for msg in emails:
sender = msg["from"][0]["email"]
by_sender[sender].append(msg["id"])
# Read up to 3 emails per sender, extract and merge signatures
profiles = {}
for sender_email, msg_ids in by_sender.items():
profile = {"email": sender_email, "name": "", "phone": None,
"linkedin": None, "title": None, "company": None}
for msg_id in msg_ids[:3]: # max 3 per sender
try:
msg = read_email(msg_id)
body = msg.get("body", "")
name = msg.get("from", [{}])[0].get("name", "")
if name:
profile["name"] = name
# ... extract fields from body and merge
except Exception:
continue
profiles[sender_email] = profile
print(f"Built {len(profiles)} enriched profiles from {len(emails)} emails")Enrich company data from DNS records
The sender’s domain reveals company infrastructure without any API call. MX records show the email provider. SPF records list the marketing and transactional email tools the company uses. DMARC records indicate security maturity.
# MX records reveal email provider
dig +short MX acme.com
# Google Workspace → tech company or startup
# Microsoft 365 → enterprise or traditional industry
# Mimecast/Proofpoint → security-conscious enterprise
# SPF records reveal tools in use
dig +short TXT acme.com | grep "v=spf1"
# include:sendgrid.net → uses SendGrid for transactional email
# include:mailchimp.com → uses Mailchimp for marketing
# include:_spf.salesforce.com → uses Salesforce
# DMARC policy reveals security posture
dig +short TXT _dmarc.acme.com
# p=reject → strict security (enterprise)
# p=none → no enforcement (startup or small org)Score seniority from title keywords
Job titles extracted from signatures map to seniority tiers. C-suite titles score highest. VP and Director sit in the middle. IC titles (Engineer, Analyst) score lowest. Combined with the email alias pattern (first-name-only addresses like sarah@ trend more senior than sarah.chen@), you get a 1-10 seniority score per contact.
def score_seniority(title: str | None, email: str) -> int:
"""Score 1-10 seniority from title + email alias pattern."""
title_score = 0
if title:
title_lower = title.lower()
for score, keywords in TITLE_TIERS.items():
if any(kw.lower() in title_lower for kw in keywords):
title_score = score * 2 # scale to 0-10
break
# Email alias bonus: firstname@ = +2, first.last@ = +1
local = email.split("@")[0]
alias_bonus = 2 if "." not in local and len(local) < 15 else 1
return min(title_score + alias_bonus, 10)
# Examples:
# "VP of Engineering" + sarah@acme.com → 4*2 + 2 = 10
# "Software Engineer" + sarah.chen@acme.com → 1*2 + 1 = 3Build the enriched contact record
The final output is a JSON record per contact that combines signature data, DNS intelligence, and seniority scoring.
{
"email": "sarah@acme.com",
"name": "Sarah Chen",
"title": "VP of Engineering",
"company": "Acme Corp",
"domain": "acme.com",
"phone": "+1 (555) 014-2847",
"linkedin": "linkedin.com/in/sarahchen",
"seniority_score": 10,
"email_provider": "Google Workspace",
"tools_detected": ["SendGrid", "Salesforce"],
"security_posture": "strict",
"enrichment_sources": {
"signature_emails_parsed": 3,
"fields_from_signatures": ["title", "phone", "linkedin"],
"fields_from_dns": ["email_provider", "tools_detected", "security_posture"]
}
}Handle signature edge cases
Not all signatures are clean. These are the most common parsing challenges and how to handle them:
- HTML signatures with images: Strip tags first. Image-only signatures (logos, social icons as images) contain no text to parse. Skip these.
- Multilingual signatures: Phone numbers in international formats (+44, +91) work with the regex above. Non-Latin job titles need locale-specific keyword lists.
- Legal disclaimers: Long confidentiality notices at the end can be mistaken for signatures. Filter lines longer than 200 characters, which are almost always disclaimers, not signature fields.
- Mobile signatures: “Sent from my iPhone” contains zero enrichment data. Skip these and cross-reference with desktop emails from the same sender.
Next steps
- Reconstruct org charts from CC patterns — use extracted job titles to validate inferred hierarchy from CC behavior
- Personalize outbound email — use enriched contact data for mail merge with role-specific templates
- Import email into a graph database — attach enriched metadata to graph nodes for richer relationship queries
- Command reference — every flag, subcommand, and example