Guide

Fix Garbled Email Encoding (Mojibake)

Mojibake is a charset mismatch made visible. Trace garbled text like é back to its root cause from raw MIME headers, reverse double-encoded UTF-8 with iconv, and send clean UTF-8 from the terminal.

Written by Pouya Sanooei Software Engineer

Reviewed by Qasim Muhammad

VerifiedCLI 3.1.17 · Gmail, Outlook · last tested June 9, 2026

Command references used in this guide: nylas email read, nylas email search, and nylas email send.

What causes garbled email encoding (mojibake)?

Garbled email encoding happens when text written in one character set is decoded with a different one. The term mojibake comes from the Japanese for “character transformation,” coined when Shift-JIS mail collided with other encodings in the 1980s and 1990s. Four causes cover nearly every case: a wrong declared charset, a missing declaration, double-encoded UTF-8, and undecoded quoted-printable.

Each cause leaves a distinct fingerprint. The letter é (U+00E9) is the 2-byte UTF-8 sequence C3 A9; decode those bytes as ISO-8859-1 and you get é. Windows-1252 makes it worse by assigning printable glyphs to 27 of the 32 code points in the 0x80–0x9F range, so smart quotes become “-style triples instead of clean failures.

SymptomRoot cause
é instead of éUTF-8 bytes decoded as ISO-8859-1 or Windows-1252
é persists even in a UTF-8 viewerDouble-encoded UTF-8 (4 bytes stored for one character)
=C3=A9 in visible textQuoted-printable transfer encoding never decoded
� replacement charactersInvalid byte sequence for the declared charset
Question marks replacing accentsLossy conversion to US-ASCII somewhere in the pipeline

How do I read the declared charset from raw email headers?

The declared charset lives in the Content-Type header, and the transfer encoding in Content-Transfer-Encoding. Both are defined in RFC 2045, published in November 1996. Comparing the declared charset against the actual bytes is the first diagnostic step for any garbled message.

The nylas email read command exposes both views: --headers prints the parsed header block, and --mime dumps the raw RFC822 message, including the per-part headers of multipart bodies. A multipart message can declare a different charset for each part, so a clean top-level header doesn't clear the HTML part. Charset names are case-insensitive, and the IANA Character Sets registry lists over 250 of them, which is why senders still emit oddities like charset="ansi_x3.4-1968" instead of plain US-ASCII.

# Show parsed headers, including Content-Type and its charset parameter
nylas email read msg_abc123 --headers

# Dump the raw RFC822 message and isolate every MIME-part declaration
nylas email read msg_abc123 --mime | grep -iA1 'content-type'

# Typical output for a healthy message:
# Content-Type: text/plain; charset="UTF-8"
# Content-Transfer-Encoding: quoted-printable

# Red flag: charset="us-ascii" on a body that visibly contains accents

Why does é turn into é in email bodies?

The sequence é appears when the 2-byte UTF-8 encoding of é (C3 A9) is decoded as two separate ISO-8859-1 characters: C3 renders as à and A9 renders as ©. If a system then re-encodes that mistake to UTF-8, the character becomes 4 bytes: double-encoded UTF-8. The classic culprit is a database column declared as latin1 storing UTF-8 bytes, with an export job converting to UTF-8 a second time on the way out.

Here is the byte test promised in the TL;DR. Pipe the body through xxd: 2 bytes (C3 A9) means the stored data is fine and only the viewer's charset is wrong. 4 bytes (C3 83 C2 A9) means the data itself was corrupted at write time and needs one reverse conversion with iconv.

# Inspect the actual bytes behind the garbled character
nylas email read msg_abc123 --json | jq -r '.body' | xxd | head -20
# C3 A9        -> clean UTF-8 e-acute; fix the viewer, not the data
# C3 83 C2 A9  -> double-encoded UTF-8; the data itself is corrupted

# Reverse double encoding: decode UTF-8, write the bytes back as Latin-1,
# leaving the original single-encoded UTF-8
nylas email read msg_abc123 --json | jq -r '.body' | iconv -f UTF-8 -t ISO-8859-1

# Verify the repair round-trips: e-acute should now display correctly
nylas email read msg_abc123 --json | jq -r '.body' | iconv -f UTF-8 -t ISO-8859-1 | xxd | head -5

How do I fix quoted-printable artifacts like =C3=A9?

Visible =C3=A9 sequences mean a quoted-printable body was never decoded. RFC 2045 defines quoted-printable as =XX hex escapes with a 76-character line limit; each non-ASCII byte triples in size. Decode the escapes back to bytes, then decode those bytes as UTF-8. A lone equals sign at the end of a line is a soft line break from the same scheme, and stray ones in visible text point at the identical undecoded-transfer-encoding bug.

Subject lines use a related scheme: RFC 2047 encoded-words like =?UTF-8?Q?caf=C3=A9?=, each capped at 75 characters. The CLI's --json output returns provider-decoded fields, so artifacts there point at an upstream sender bug rather than your tooling. Python's bundled quopri module decodes stray escapes in one line.

# Decode a quoted-printable artifact manually
printf 'caf=C3=A9' | python3 -c "import quopri,sys; sys.stdout.buffer.write(quopri.decodestring(sys.stdin.buffer.read()))"
# Output: café

# Check whether the subject still carries undecoded artifacts
nylas email read msg_abc123 --json | jq -r '.subject'

# Find other messages from the same broken sender pipeline ("*" matches any subject)
nylas email search "*" --from sender@example.com --limit 20 --json | jq -r '.[].subject'

How do I prevent garbled text when sending email from the terminal?

Prevention means never hand-writing MIME headers. The nylas email send command submits your text through the Nylas API, which builds the Content-Type and transfer encoding for you, so the declared charset always matches the bytes. Most mojibake ships from scripts that concatenate raw SMTP headers themselves.

Header support for raw UTF-8 only arrived with RFC 6532 in February 2012, and per the Unicode Consortium's UTF/BOM FAQ, a UTF-8 byte order mark is permitted but not required — another classic source of stray EF BB BF bytes in piped input. Sending accented text takes one command, and reading the message back confirms the headers in about 5 seconds.

# Send accented UTF-8 text; the API declares the matching charset
nylas email send --to user@example.com \
  --subject "Café réservation confirmée" \
  --body "Votre table est prête à 19h30." \
  --yes

# Read the sent message back and confirm the declared charset matches
nylas email search "Café réservation" --limit 1 --json | jq -r '.[0].id'
nylas email read <that-id> --headers

# If body text comes from a file, strip a UTF-8 BOM as you send
nylas email send --to user@example.com --subject "Report" \
  --body "$(sed '1s/^\xEF\xBB\xBF//' body.txt)" --yes

Before blaming the provider, work through the chain in order: terminal locale, declared charset, actual bytes, transfer encoding. A terminal set to LANG=C garbles perfectly valid UTF-8 on display, and no amount of header fixing changes that. This workflow was verified on Nylas CLI 3.1.17 against Gmail and Outlook grants on June 9, 2026; other providers' charset handling is described from the RFCs above, so verify locally before automating fixes. The 4-step checklist takes under 2 minutes per message:

  1. Run locale — confirm the terminal uses a UTF-8 locale before trusting what you see.
  2. Run nylas email read <id> --headers and note the declared charset and transfer encoding.
  3. Pipe the body through xxd and apply the 2-byte vs 4-byte test for double encoding.
  4. Only then convert: iconv for charset repair, quopri for stray escapes.

Next steps