Guide

Debugging Invisible Characters in Email

Invisible characters are non-printing Unicode code points that exist in text but have no visible glyph. In email, they cause subject line mismatches, broken filters, encoding errors, attachment filename corruption, and even security vulnerabilities. This guide shows how to detect and fix them using the Nylas CLI and standard Unix tools.

When strings match visually but fail programmatically

You write a script that filters emails by subject line. It works for most messages but silently skips one. You copy the subject from the email, paste it into your script, and it still does not match. You stare at two identical-looking strings. They are not identical.

This is the hallmark of invisible characters. The email client renders them as nothing, but your code sees the extra bytes and the string comparison fails. The same problem can affect:

  • Subject line filters and search queries
  • From/To address matching
  • Attachment filenames (especially when downloaded to disk)
  • Email body parsing with regex or string operations
  • Calendar event titles created from email content

The usual suspects

CharacterUnicodeHex bytes (UTF-8)Common source
UTF-8 BOMU+FEFFEF BB BFWindows text editors, Excel CSV export
Non-breaking spaceU+00A0C2 A0Copy-paste from web pages, macOS Option+Space
Zero-width spaceU+200BE2 80 8BRich text editors, HTML copy-paste
Zero-width joinerU+200DE2 80 8DEmoji sequences, Arabic/Hindi text
Zero-width non-joinerU+200CE2 80 8CPersian/Arabic text, HTML editors
Soft hyphenU+00ADC2 ADWord processors, hyphenation engines
Right-to-left overrideU+202EE2 80 AEMalicious filenames, bidirectional text
Word joinerU+2060E2 81 A0Typesetting software
Smart quotes (left)U+201CE2 80 9CMicrosoft Office, macOS auto-correct
Em dashU+2014E2 80 94Microsoft Office, macOS auto-correct

Step 1: Extract raw email data

Start by getting the email content as structured JSON. The --json flag gives you access to every field without the email client's rendering layer hiding characters.

# Get the full email as JSON
nylas email read msg_abc123 --json

# Extract just the subject and pipe to hex viewer
nylas email read msg_abc123 --json | jq -r '.subject' | xxd

# Check the From field
nylas email read msg_abc123 --json | jq -r '.from[0].name' | xxd

# Check attachment filenames
nylas email read msg_abc123 --json | jq -r '.attachments[].filename' | xxd

Step 2: Inspect bytes with xxd and hexdump

xxd and hexdump show every byte in the string, including invisible ones. Here is what to look for:

# A clean subject line looks like this:
echo "Weekly Report" | xxd
# 00000000: 5765 656b 6c79 2052 6570 6f72 740a       Weekly Report.

# A subject with a hidden UTF-8 BOM at the start:
printf '\xef\xbb\xbfWeekly Report' | xxd
# 00000000: efbb bf57 6565 6b6c 7920 5265 706f 7274  ...Weekly Report
#           ^^^^^^
#           UTF-8 BOM -- invisible but breaks string comparison

# A subject with a non-breaking space instead of regular space:
printf 'Weekly\xc2\xa0Report' | xxd
# 00000000: 5765 656b 6c79 c2a0 5265 706f 7274       Weekly..Report
#                         ^^^^
#                         Non-breaking space (U+00A0) instead of 0x20

# A subject with a zero-width space:
printf 'Weekly\xe2\x80\x8b Report' | xxd
# 00000000: 5765 656b 6c79 e280 8b20 5265 706f 7274  Weekly... Report
#                         ^^^^^^^^
#                         Zero-width space -- completely invisible

Step 3: Automate invisible character detection

You can build a quick detection script that scans email subjects for common invisible characters:

# Scan recent emails for invisible characters in subjects
nylas email list --json --limit 50 | jq -r '.[].subject' | while IFS= read -r subject; do
  # Check for common invisible characters
  if echo "$subject" | grep -qP '[\x{200B}\x{200C}\x{200D}\x{FEFF}\x{00AD}\x{2060}\x{202E}]'; then
    echo "FOUND: $subject"
    echo "$subject" | xxd | head -5
    echo "---"
  fi
done

# Or check a single email's subject for non-ASCII bytes
nylas email read msg_abc123 --json | jq -r '.subject' | \
  LC_ALL=C grep -P '[^\x20-\x7E]' && echo "Contains non-ASCII" || echo "Clean ASCII"

# Check what encoding the email claims to use
nylas email read msg_abc123 --json | jq '.headers' | grep -i content-type

Step 4: Strip invisible characters

Once you have identified the problem, here are the fixes:

# Strip UTF-8 BOM from a string
echo "$SUBJECT" | sed 's/^\xEF\xBB\xBF//'

# Replace non-breaking spaces with regular spaces
echo "$SUBJECT" | sed 's/\xC2\xA0/ /g'

# Remove all zero-width characters
echo "$SUBJECT" | perl -CSD -pe 's/[\x{200B}\x{200C}\x{200D}\x{FEFF}\x{2060}]//g'

# Nuclear option: strip everything outside printable ASCII + common Unicode
echo "$SUBJECT" | perl -CSD -pe 's/[^\x20-\x7E\x{00C0}-\x{024F}\x{0400}-\x{04FF}]//g'

# Convert encoding if the email uses a non-UTF-8 charset
nylas email read msg_abc123 --json | jq -r '.body' | iconv -f ISO-8859-1 -t UTF-8

# Check file encoding of a downloaded attachment
file --mime-encoding attachment.csv
# attachment.csv: utf-8-bom
# Fix it:
sed -i '1s/^\xEF\xBB\xBF//' attachment.csv

Invisible characters in attachment filenames

Attachment filenames are especially prone to invisible character issues. The filename in the email header might contain right-to-left override characters that make report.pdf appear as fdp.troper in some contexts -- a known attack vector.

# List all attachment filenames from recent emails
nylas email list --json --limit 20 | \
  jq -r '.[].attachments[]?.filename // empty' | \
  while IFS= read -r filename; do
    # Check each filename for suspicious characters
    if echo "$filename" | grep -qP '[\x{202E}\x{200F}\x{200E}\x{202B}\x{202A}]'; then
      echo "WARNING: Bidirectional override in filename: $filename"
      echo "$filename" | xxd
    fi
  done

# Safely download an attachment (sanitizes filename automatically)
nylas email read msg_abc123 --json | \
  jq -r '.attachments[0].filename' | \
  tr -d '\200-\237' > /dev/null  # Strip C1 control characters

Understanding email encoding headers

Email uses several headers to declare encoding. When these are wrong or missing, invisible character issues multiply:

# Check the Content-Type and charset of an email
nylas email read msg_abc123 --json | jq '{
  content_type: .headers["content-type"],
  transfer_encoding: .headers["content-transfer-encoding"],
  subject_raw: .subject
}'

# Common charset declarations:
# Content-Type: text/plain; charset="UTF-8"        -- modern, correct
# Content-Type: text/plain; charset="ISO-8859-1"   -- Western European
# Content-Type: text/plain; charset="windows-1252" -- Windows Western
# Content-Type: text/plain; charset="us-ascii"     -- sometimes lies about non-ASCII content

The most common encoding bug: an email declares charset=us-ascii or charset=ISO-8859-1 but actually contains UTF-8 text. The non-ASCII bytes get misinterpreted, producing mojibake (garbled characters) or invisible corruption.

Sanitize email content before feeding to LLMs

If you are piping email content to an LLM (via Nylas CLI's MCP server or subprocess), invisible characters can confuse the model or trigger unexpected behavior:

# Clean email body before sending to an LLM
nylas email read msg_abc123 --json | jq -r '.body' | \
  perl -CSD -pe '
    s/[\x{200B}-\x{200D}\x{FEFF}\x{2060}]//g;  # Remove zero-width chars
    s/[\x{202A}-\x{202E}]//g;                      # Remove bidi overrides
    s/\x{00A0}/ /g;                                  # Non-breaking space to space
    s/[\x{2018}\x{2019}]/'"'"'/g;                   # Smart quotes to ASCII
    s/[\x{201C}\x{201D}]/"/g;                       # Smart double quotes
    s/\x{2014}/--/g;                                  # Em dash to double hyphen
    s/\x{2013}/-/g;                                   # En dash to hyphen
  '

Preventing invisible character issues

  • Always use --json when processing email programmatically. The JSON output preserves the exact bytes without terminal rendering.
  • Validate encoding before processing. Check the Content-Type charset header and convert if needed with iconv.
  • Sanitize user input before composing emails. Strip zero-width characters and normalize Unicode (NFC form) before passing to nylas email send.
  • Use binary-safe comparisons in your scripts. Compare bytes, not rendered glyphs.
  • Log hex representations when debugging string mismatches. If two strings look the same but are not equal, xxd will show why.

Frequently asked questions

Why does copy-pasting from Gmail add invisible characters?

Gmail's web interface renders email as HTML. When you copy text, the browser includes formatting characters like non-breaking spaces (from  ), zero-width joiners (from CSS word-break handling), and smart quotes (from automatic substitution). These are invisible when pasted into a terminal or text editor but exist as bytes in the string.

Can invisible characters break email delivery?

In headers (From, To, Subject), yes. If a recipient address contains a zero-width space, the SMTP server will reject it as malformed. In the body, invisible characters are usually harmless for delivery but can break downstream processing.

How do I tell if a character is invisible vs just a rendering issue?

Pipe the string through xxd. If you see bytes between the visible characters that are not 20 (regular space) or 0a (newline), those are the invisible characters. Cross-reference the hex bytes against the Unicode table above.

Are invisible characters a security risk in email?

Yes. Right-to-left override (U+202E) can disguise malicious filenames: report[RLO]fdp.exe renders as reportexe.pdf. Zero-width characters can bypass content filters and spam detection. Homoglyph attacks use visually similar characters from different Unicode blocks to spoof addresses. Always sanitize filenames and validate sender addresses at the byte level.

Does Nylas CLI normalize Unicode automatically?

Nylas CLI passes through the exact bytes from the email provider. It does not normalize or strip invisible characters, which is the correct behavior -- you want to see what is actually in the email. Use the techniques in this guide to sanitize when needed.


Next steps