Guide

Deduplicate Contacts from the Terminal

Duplicate contacts pile up when the same person lands in your address book from email, an import, and a connected account. This guide pulls every record as JSON, groups them by lowercased email with jq to surface the duplicates, and removes the extras with one delete command per ID.

Written by Prem Keshari Senior SRE

Reviewed by Qasim Muhammad

Updated June 9, 2026

Verified — CLI 3.1.17 · Google, Outlook · last tested June 9, 2026

What counts as a duplicate contact?

A duplicate contact is two or more address-book records that point to the same person, most reliably detected by a shared email address. The same human routinely lands in your contacts three ways: auto-added from an inbound email, pulled from a CSV import, and synced from a connected account. Each carries a different provider ID.

Email is the stable key because names drift — “Bob Smith”, “Robert Smith”, and “bob@acme.com” can all be one person. Microsoft Graph and Google People both let a single contact hold several addresses, so a strict match deduplicates on the lowercased primary email. The CLI returns contacts as normalized JSON, which means one jq pipeline handles every provider rather than a parser per backend. A 2,000-row export with 12% duplicates is common after two or three imports.

How do I pull all my contacts as JSON?

You export the full address book to a file with one command, then run every later step against that snapshot offline. The nylas contacts list command auto-paginates past its default page of 50 once you raise the limit, so a single invocation pulls thousands of records. Working from a saved file keeps the dedupe deterministic and avoids re-hitting the provider API on each pass.

# Pull every contact as JSON into a snapshot file
nylas contacts list --json --limit 5000 > contacts.json

# How many records did you get?
jq 'length' contacts.json

The --limit 5000 flag raises the cap from the default 50; the tool pages automatically once the value exceeds 200. Each record carries an id and an emails array, which are the two fields the next step keys on. Exporting 5,000 records to disk takes a couple of seconds and a single read from the provider.

How do I find duplicate contacts with jq?

You find duplicates by grouping every record on its lowercased primary email and keeping only the groups with more than one entry. The jq tool reads the snapshot, normalizes each address with ascii_downcase, and reports each cluster of IDs that share an address. This surfaces the work before you touch anything.

# Group by lowercased email, show only addresses with 2+ records
jq -r '
  map(select((.emails | length) > 0))
  | group_by(.emails[0].email | ascii_downcase)
  | map(select(length > 1))
  | .[]
  | "\(.[0].emails[0].email)\t\(map(.id) | join(", "))"
' contacts.json

Each output line is one email address followed by the tab-separated IDs that share it. Records with no email are filtered out first, since they cannot be matched on the stable key. On a 2,000-contact book this pass finishes in under 1 second and typically prints 20 to 40 clusters — the exact set you need to resolve before deleting.

Which copy do I keep before deleting?

You keep the most complete record and delete the rest. Pick the copy with the most populated fields — a name, a phone number, a job title — rather than the oldest or the first one returned. Deleting blindly risks dropping the one record that held a phone number the others lacked, and a delete against a provider-native ID is permanent.

# For each duplicate group, keep the record with the most filled fields;
# list the OTHER ids (the ones safe to delete) to a file
jq -r '
  def filled: [.given_name, .surname, (.phone_numbers // []), (.emails // [])]
    | map(select(. != null and . != "" and . != [])) | length;
  map(select((.emails | length) > 0))
  | group_by(.emails[0].email | ascii_downcase)
  | map(select(length > 1))
  | map(sort_by(filled) | reverse | .[1:])
  | flatten | .[].id
' contacts.json > delete-ids.txt

wc -l delete-ids.txt

The filled helper scores each record by how many of its fields hold real values; sorting descending and slicing off the top keeper leaves only the IDs to remove. Review delete-ids.txt by hand before the next step — 30 seconds of reading prevents an irreversible mistake.

How do I delete the duplicate contacts?

You delete each extra ID with nylas contacts delete, looping over the reviewed file. The --force flag skips the per-contact confirmation prompt so the loop runs unattended. Each delete is forwarded straight to Google or Microsoft, so the change is live the moment the command returns — there is no local sync state to reconcile.

# Spot-check one ID before the loop
head -n1 delete-ids.txt | xargs -I{} nylas contacts show {}

# Delete every reviewed duplicate ID
while read -r id; do
  nylas contacts delete "$id" --force
done < delete-ids.txt

# Confirm the count dropped
nylas contacts list --json --limit 5000 | jq 'length'

Run contacts show on the first ID to confirm you targeted the right record before the loop removes 40 contacts in one pass. The final jq 'length' should be lower than your starting count by exactly the number of lines in the delete file — that arithmetic is your proof the run did what you intended.

Next steps

Manage contacts from the terminal — create, update, search, and show contacts
Sync contacts across Gmail and Outlook — merge two address books on email before you dedupe
Enrich contacts from email — fill the gaps the surviving record still has
Export contacts to CSV — back up the address book before a bulk delete
Save email as PDF — another JSON-to-output pipeline driven from the shell
Manage contact groups — organize cleaned-up contacts into groups from the CLI
Command reference — every flag, subcommand, and example
jq manual — the grammar behind group_by and ascii_downcase
Google People API: connections.list — how Google returns multiple addresses per contact
Microsoft Graph: list contacts — the Outlook side, where one contact can hold several emails
RFC 5322 — Internet Message Format — why email addresses are case-insensitive in the domain part