Guide
Fix Garbled Email Encoding (Mojibake)
Mojibake is a charset mismatch made visible. Trace garbled text like é back to its root cause from raw MIME headers, reverse double-encoded UTF-8 with iconv, and send clean UTF-8 from the terminal.
Written by Pouya Sanooei Software Engineer
Reviewed by Qasim Muhammad
Command references used in this guide: nylas email read, nylas email search, and nylas email send.
What causes garbled email encoding (mojibake)?
Garbled email encoding happens when text written in one character set is decoded with a different one. The term mojibake comes from the Japanese for “character transformation,” coined when Shift-JIS mail collided with other encodings in the 1980s and 1990s. Four causes cover nearly every case: a wrong declared charset, a missing declaration, double-encoded UTF-8, and undecoded quoted-printable.
Each cause leaves a distinct fingerprint. The letter é (U+00E9) is the 2-byte UTF-8 sequence C3 A9; decode those bytes as ISO-8859-1 and you get é. Windows-1252 makes it worse by assigning printable glyphs to 27 of the 32 code points in the 0x80–0x9F range, so smart quotes become “-style triples instead of clean failures.
| Symptom | Root cause |
|---|---|
| é instead of é | UTF-8 bytes decoded as ISO-8859-1 or Windows-1252 |
| é persists even in a UTF-8 viewer | Double-encoded UTF-8 (4 bytes stored for one character) |
=C3=A9 in visible text | Quoted-printable transfer encoding never decoded |
| � replacement characters | Invalid byte sequence for the declared charset |
| Question marks replacing accents | Lossy conversion to US-ASCII somewhere in the pipeline |
How do I read the declared charset from raw email headers?
The declared charset lives in the Content-Type header, and the transfer encoding in Content-Transfer-Encoding. Both are defined in RFC 2045, published in November 1996. Comparing the declared charset against the actual bytes is the first diagnostic step for any garbled message.
The nylas email read command exposes both views: --headers prints the parsed header block, and --mime dumps the raw RFC822 message, including the per-part headers of multipart bodies. A multipart message can declare a different charset for each part, so a clean top-level header doesn't clear the HTML part. Charset names are case-insensitive, and the IANA Character Sets registry lists over 250 of them, which is why senders still emit oddities like charset="ansi_x3.4-1968" instead of plain US-ASCII.
# Show parsed headers, including Content-Type and its charset parameter
nylas email read msg_abc123 --headers
# Dump the raw RFC822 message and isolate every MIME-part declaration
nylas email read msg_abc123 --mime | grep -iA1 'content-type'
# Typical output for a healthy message:
# Content-Type: text/plain; charset="UTF-8"
# Content-Transfer-Encoding: quoted-printable
# Red flag: charset="us-ascii" on a body that visibly contains accentsWhy does é turn into é in email bodies?
The sequence é appears when the 2-byte UTF-8 encoding of é (C3 A9) is decoded as two separate ISO-8859-1 characters: C3 renders as à and A9 renders as ©. If a system then re-encodes that mistake to UTF-8, the character becomes 4 bytes: double-encoded UTF-8. The classic culprit is a database column declared as latin1 storing UTF-8 bytes, with an export job converting to UTF-8 a second time on the way out.
Here is the byte test promised in the TL;DR. Pipe the body through xxd: 2 bytes (C3 A9) means the stored data is fine and only the viewer's charset is wrong. 4 bytes (C3 83 C2 A9) means the data itself was corrupted at write time and needs one reverse conversion with iconv.
# Inspect the actual bytes behind the garbled character
nylas email read msg_abc123 --json | jq -r '.body' | xxd | head -20
# C3 A9 -> clean UTF-8 e-acute; fix the viewer, not the data
# C3 83 C2 A9 -> double-encoded UTF-8; the data itself is corrupted
# Reverse double encoding: decode UTF-8, write the bytes back as Latin-1,
# leaving the original single-encoded UTF-8
nylas email read msg_abc123 --json | jq -r '.body' | iconv -f UTF-8 -t ISO-8859-1
# Verify the repair round-trips: e-acute should now display correctly
nylas email read msg_abc123 --json | jq -r '.body' | iconv -f UTF-8 -t ISO-8859-1 | xxd | head -5How do I fix quoted-printable artifacts like =C3=A9?
Visible =C3=A9 sequences mean a quoted-printable body was never decoded. RFC 2045 defines quoted-printable as =XX hex escapes with a 76-character line limit; each non-ASCII byte triples in size. Decode the escapes back to bytes, then decode those bytes as UTF-8. A lone equals sign at the end of a line is a soft line break from the same scheme, and stray ones in visible text point at the identical undecoded-transfer-encoding bug.
Subject lines use a related scheme: RFC 2047 encoded-words like =?UTF-8?Q?caf=C3=A9?=, each capped at 75 characters. The CLI's --json output returns provider-decoded fields, so artifacts there point at an upstream sender bug rather than your tooling. Python's bundled quopri module decodes stray escapes in one line.
# Decode a quoted-printable artifact manually
printf 'caf=C3=A9' | python3 -c "import quopri,sys; sys.stdout.buffer.write(quopri.decodestring(sys.stdin.buffer.read()))"
# Output: café
# Check whether the subject still carries undecoded artifacts
nylas email read msg_abc123 --json | jq -r '.subject'
# Find other messages from the same broken sender pipeline ("*" matches any subject)
nylas email search "*" --from sender@example.com --limit 20 --json | jq -r '.[].subject'How do I prevent garbled text when sending email from the terminal?
Prevention means never hand-writing MIME headers. The nylas email send command submits your text through the Nylas API, which builds the Content-Type and transfer encoding for you, so the declared charset always matches the bytes. Most mojibake ships from scripts that concatenate raw SMTP headers themselves.
Header support for raw UTF-8 only arrived with RFC 6532 in February 2012, and per the Unicode Consortium's UTF/BOM FAQ, a UTF-8 byte order mark is permitted but not required — another classic source of stray EF BB BF bytes in piped input. Sending accented text takes one command, and reading the message back confirms the headers in about 5 seconds.
# Send accented UTF-8 text; the API declares the matching charset
nylas email send --to user@example.com \
--subject "Café réservation confirmée" \
--body "Votre table est prête à 19h30." \
--yes
# Read the sent message back and confirm the declared charset matches
nylas email search "Café réservation" --limit 1 --json | jq -r '.[0].id'
nylas email read <that-id> --headers
# If body text comes from a file, strip a UTF-8 BOM as you send
nylas email send --to user@example.com --subject "Report" \
--body "$(sed '1s/^\xEF\xBB\xBF//' body.txt)" --yesBefore blaming the provider, work through the chain in order: terminal locale, declared charset, actual bytes, transfer encoding. A terminal set to LANG=C garbles perfectly valid UTF-8 on display, and no amount of header fixing changes that. This workflow was verified on Nylas CLI 3.1.17 against Gmail and Outlook grants on June 9, 2026; other providers' charset handling is described from the RFCs above, so verify locally before automating fixes. The 4-step checklist takes under 2 minutes per message:
- Run
locale— confirm the terminal uses a UTF-8 locale before trusting what you see. - Run
nylas email read <id> --headersand note the declared charset and transfer encoding. - Pipe the body through
xxdand apply the 2-byte vs 4-byte test for double encoding. - Only then convert:
iconvfor charset repair,quoprifor stray escapes.
Next steps
- Debug invisible characters in email — zero-width spaces, BOMs, and byte-level inspection with xxd
- Analyze email headers from the CLI — authentication, routing, and MIME headers explained
- Extract email data with jq — pull subjects, bodies, and headers out of JSON output
- EmailEngine vs Nylas — how self-hosted IMAP middleware compares on message parsing
- Twilio SendGrid vs Nylas — transactional sending stacks and where encoding bugs creep in
- Full command reference — every flag and subcommand documented
- RFC 2045 (MIME Part One) — Content-Type, charset parameters, and quoted-printable
- RFC 2047 (MIME Part Three) — encoded-words for non-ASCII headers
- RFC 6532 (Internationalized Email Headers) — native UTF-8 in header fields
- Unicode UTF & BOM FAQ — byte order mark rules for UTF-8