Guide
Debugging Invisible Characters in Email
Invisible characters are non-printing Unicode code points that exist in text but have no visible glyph. In email, they cause subject line mismatches, broken filters, encoding errors, attachment filename corruption, and even security vulnerabilities. This guide shows how to detect and fix them using the Nylas CLI and standard Unix tools.
When strings match visually but fail programmatically
You write a script that filters emails by subject line. It works for most messages but silently skips one. You copy the subject from the email, paste it into your script, and it still does not match. You stare at two identical-looking strings. They are not identical.
This is the hallmark of invisible characters. The email client renders them as nothing, but your code sees the extra bytes and the string comparison fails. The same problem can affect:
- Subject line filters and search queries
- From/To address matching
- Attachment filenames (especially when downloaded to disk)
- Email body parsing with regex or string operations
- Calendar event titles created from email content
The usual suspects
| Character | Unicode | Hex bytes (UTF-8) | Common source |
|---|---|---|---|
| UTF-8 BOM | U+FEFF | EF BB BF | Windows text editors, Excel CSV export |
| Non-breaking space | U+00A0 | C2 A0 | Copy-paste from web pages, macOS Option+Space |
| Zero-width space | U+200B | E2 80 8B | Rich text editors, HTML copy-paste |
| Zero-width joiner | U+200D | E2 80 8D | Emoji sequences, Arabic/Hindi text |
| Zero-width non-joiner | U+200C | E2 80 8C | Persian/Arabic text, HTML editors |
| Soft hyphen | U+00AD | C2 AD | Word processors, hyphenation engines |
| Right-to-left override | U+202E | E2 80 AE | Malicious filenames, bidirectional text |
| Word joiner | U+2060 | E2 81 A0 | Typesetting software |
| Smart quotes (left) | U+201C | E2 80 9C | Microsoft Office, macOS auto-correct |
| Em dash | U+2014 | E2 80 94 | Microsoft Office, macOS auto-correct |
Step 1: Extract raw email data
Start by getting the email content as structured JSON. The --json flag gives you access to every field without the email client's rendering layer hiding characters.
# Get the full email as JSON
nylas email read msg_abc123 --json
# Extract just the subject and pipe to hex viewer
nylas email read msg_abc123 --json | jq -r '.subject' | xxd
# Check the From field
nylas email read msg_abc123 --json | jq -r '.from[0].name' | xxd
# Check attachment filenames
nylas email read msg_abc123 --json | jq -r '.attachments[].filename' | xxdStep 2: Inspect bytes with xxd and hexdump
xxd and hexdump show every byte in the string, including invisible ones. Here is what to look for:
# A clean subject line looks like this:
echo "Weekly Report" | xxd
# 00000000: 5765 656b 6c79 2052 6570 6f72 740a Weekly Report.
# A subject with a hidden UTF-8 BOM at the start:
printf '\xef\xbb\xbfWeekly Report' | xxd
# 00000000: efbb bf57 6565 6b6c 7920 5265 706f 7274 ...Weekly Report
# ^^^^^^
# UTF-8 BOM -- invisible but breaks string comparison
# A subject with a non-breaking space instead of regular space:
printf 'Weekly\xc2\xa0Report' | xxd
# 00000000: 5765 656b 6c79 c2a0 5265 706f 7274 Weekly..Report
# ^^^^
# Non-breaking space (U+00A0) instead of 0x20
# A subject with a zero-width space:
printf 'Weekly\xe2\x80\x8b Report' | xxd
# 00000000: 5765 656b 6c79 e280 8b20 5265 706f 7274 Weekly... Report
# ^^^^^^^^
# Zero-width space -- completely invisibleStep 3: Automate invisible character detection
You can build a quick detection script that scans email subjects for common invisible characters:
# Scan recent emails for invisible characters in subjects
nylas email list --json --limit 50 | jq -r '.[].subject' | while IFS= read -r subject; do
# Check for common invisible characters
if echo "$subject" | grep -qP '[\x{200B}\x{200C}\x{200D}\x{FEFF}\x{00AD}\x{2060}\x{202E}]'; then
echo "FOUND: $subject"
echo "$subject" | xxd | head -5
echo "---"
fi
done
# Or check a single email's subject for non-ASCII bytes
nylas email read msg_abc123 --json | jq -r '.subject' | \
LC_ALL=C grep -P '[^\x20-\x7E]' && echo "Contains non-ASCII" || echo "Clean ASCII"
# Check what encoding the email claims to use
nylas email read msg_abc123 --json | jq '.headers' | grep -i content-typeStep 4: Strip invisible characters
Once you have identified the problem, here are the fixes:
# Strip UTF-8 BOM from a string
echo "$SUBJECT" | sed 's/^\xEF\xBB\xBF//'
# Replace non-breaking spaces with regular spaces
echo "$SUBJECT" | sed 's/\xC2\xA0/ /g'
# Remove all zero-width characters
echo "$SUBJECT" | perl -CSD -pe 's/[\x{200B}\x{200C}\x{200D}\x{FEFF}\x{2060}]//g'
# Nuclear option: strip everything outside printable ASCII + common Unicode
echo "$SUBJECT" | perl -CSD -pe 's/[^\x20-\x7E\x{00C0}-\x{024F}\x{0400}-\x{04FF}]//g'
# Convert encoding if the email uses a non-UTF-8 charset
nylas email read msg_abc123 --json | jq -r '.body' | iconv -f ISO-8859-1 -t UTF-8
# Check file encoding of a downloaded attachment
file --mime-encoding attachment.csv
# attachment.csv: utf-8-bom
# Fix it:
sed -i '1s/^\xEF\xBB\xBF//' attachment.csvInvisible characters in attachment filenames
Attachment filenames are especially prone to invisible character issues. The filename in the email header might contain right-to-left override characters that make report.pdf appear as fdp.troper in some contexts -- a known attack vector.
# List all attachment filenames from recent emails
nylas email list --json --limit 20 | \
jq -r '.[].attachments[]?.filename // empty' | \
while IFS= read -r filename; do
# Check each filename for suspicious characters
if echo "$filename" | grep -qP '[\x{202E}\x{200F}\x{200E}\x{202B}\x{202A}]'; then
echo "WARNING: Bidirectional override in filename: $filename"
echo "$filename" | xxd
fi
done
# Safely download an attachment (sanitizes filename automatically)
nylas email read msg_abc123 --json | \
jq -r '.attachments[0].filename' | \
tr -d '\200-\237' > /dev/null # Strip C1 control charactersUnderstanding email encoding headers
Email uses several headers to declare encoding. When these are wrong or missing, invisible character issues multiply:
# Check the Content-Type and charset of an email
nylas email read msg_abc123 --json | jq '{
content_type: .headers["content-type"],
transfer_encoding: .headers["content-transfer-encoding"],
subject_raw: .subject
}'
# Common charset declarations:
# Content-Type: text/plain; charset="UTF-8" -- modern, correct
# Content-Type: text/plain; charset="ISO-8859-1" -- Western European
# Content-Type: text/plain; charset="windows-1252" -- Windows Western
# Content-Type: text/plain; charset="us-ascii" -- sometimes lies about non-ASCII contentThe most common encoding bug: an email declares charset=us-ascii or charset=ISO-8859-1 but actually contains UTF-8 text. The non-ASCII bytes get misinterpreted, producing mojibake (garbled characters) or invisible corruption.
Sanitize email content before feeding to LLMs
If you are piping email content to an LLM (via Nylas CLI's MCP server or subprocess), invisible characters can confuse the model or trigger unexpected behavior:
# Clean email body before sending to an LLM
nylas email read msg_abc123 --json | jq -r '.body' | \
perl -CSD -pe '
s/[\x{200B}-\x{200D}\x{FEFF}\x{2060}]//g; # Remove zero-width chars
s/[\x{202A}-\x{202E}]//g; # Remove bidi overrides
s/\x{00A0}/ /g; # Non-breaking space to space
s/[\x{2018}\x{2019}]/'"'"'/g; # Smart quotes to ASCII
s/[\x{201C}\x{201D}]/"/g; # Smart double quotes
s/\x{2014}/--/g; # Em dash to double hyphen
s/\x{2013}/-/g; # En dash to hyphen
'Preventing invisible character issues
- Always use
--jsonwhen processing email programmatically. The JSON output preserves the exact bytes without terminal rendering. - Validate encoding before processing. Check the
Content-Typecharset header and convert if needed withiconv. - Sanitize user input before composing emails. Strip zero-width characters and normalize Unicode (NFC form) before passing to
nylas email send. - Use binary-safe comparisons in your scripts. Compare bytes, not rendered glyphs.
- Log hex representations when debugging string mismatches. If two strings look the same but are not equal,
xxdwill show why.
Frequently asked questions
Why does copy-pasting from Gmail add invisible characters?
Gmail's web interface renders email as HTML. When you copy text, the browser includes formatting characters like non-breaking spaces (from ), zero-width joiners (from CSS word-break handling), and smart quotes (from automatic substitution). These are invisible when pasted into a terminal or text editor but exist as bytes in the string.
Can invisible characters break email delivery?
In headers (From, To, Subject), yes. If a recipient address contains a zero-width space, the SMTP server will reject it as malformed. In the body, invisible characters are usually harmless for delivery but can break downstream processing.
How do I tell if a character is invisible vs just a rendering issue?
Pipe the string through xxd. If you see bytes between the visible characters that are not 20 (regular space) or 0a (newline), those are the invisible characters. Cross-reference the hex bytes against the Unicode table above.
Are invisible characters a security risk in email?
Yes. Right-to-left override (U+202E) can disguise malicious filenames: report[RLO]fdp.exe renders as reportexe.pdf. Zero-width characters can bypass content filters and spam detection. Homoglyph attacks use visually similar characters from different Unicode blocks to spoof addresses. Always sanitize filenames and validate sender addresses at the byte level.
Does Nylas CLI normalize Unicode automatically?
Nylas CLI passes through the exact bytes from the email provider. It does not normalize or strip invisible characters, which is the correct behavior -- you want to see what is actually in the email. Use the techniques in this guide to sanitize when needed.
Next steps
- Secure email handling from the CLI -- GPG encryption, sender verification, and safe attachment handling
- Send email from the terminal -- compose and send email with proper encoding
- E2E email testing with Playwright -- verify email content in automated tests
- Full command reference -- every flag, subcommand, and example