Source: https://cli.nylas.com/guides/fix-garbled-email-encoding

# Fix Garbled Email Encoding (Mojibake)

Mojibake is a charset mismatch made visible. Trace garbled text like Ã© back to its root cause from raw MIME headers, reverse double-encoded UTF-8 with iconv, and send clean UTF-8 from the terminal.

Written by [Pouya Sanooei](https://cli.nylas.com/authors/pouya-sanooei) Software Engineer

Reviewed by [Qasim Muhammad](https://cli.nylas.com/authors/qasim-muhammad)

Updated June 9, 2026

> **TL;DR:** Garbled email encoding (mojibake) is always a charset mismatch: bytes written in one encoding, decoded in another. Run `nylas email read <id> --headers` to see the declared charset, then `xxd` to see the actual bytes. One byte pattern separates a simple mis-decode from double-encoded UTF-8, and the two need opposite fixes — the test is below.

Command references used in this guide: [`nylas email read`](https://cli.nylas.com/docs/commands/email-read), [`nylas email search`](https://cli.nylas.com/docs/commands/email-search), and [`nylas email send`](https://cli.nylas.com/docs/commands/email-send).

## What causes garbled email encoding (mojibake)?

Garbled email encoding happens when text written in one character set is decoded with a different one. The term mojibake comes from the Japanese for “character transformation,” coined when Shift-JIS mail collided with other encodings in the 1980s and 1990s. Four causes cover nearly every case: a wrong declared charset, a missing declaration, double-encoded UTF-8, and undecoded quoted-printable.

Each cause leaves a distinct fingerprint. The letter é (U+00E9) is the 2-byte UTF-8 sequence `C3 A9`; decode those bytes as ISO-8859-1 and you get Ã©. Windows-1252 makes it worse by assigning printable glyphs to 27 of the 32 code points in the 0x80–0x9F range, so smart quotes become â€œ-style triples instead of clean failures.

| Symptom | Root cause |
| --- | --- |
| Ã© instead of é | UTF-8 bytes decoded as ISO-8859-1 or Windows-1252 |
| Ã© persists even in a UTF-8 viewer | Double-encoded UTF-8 (4 bytes stored for one character) |
| `=C3=A9` in visible text | Quoted-printable transfer encoding never decoded |
| � replacement characters | Invalid byte sequence for the declared charset |
| Question marks replacing accents | Lossy conversion to US-ASCII somewhere in the pipeline |

## How do I read the declared charset from raw email headers?

The declared charset lives in the `Content-Type` header, and the transfer encoding in `Content-Transfer-Encoding`. Both are defined in [RFC 2045](https://datatracker.ietf.org/doc/html/rfc2045), published in November 1996. Comparing the declared charset against the actual bytes is the first diagnostic step for any garbled message.

The `nylas email read` command exposes both views: `--headers` prints the parsed header block, and `--mime` dumps the raw RFC822 message, including the per-part headers of multipart bodies. A multipart message can declare a different charset for each part, so a clean top-level header doesn't clear the HTML part. Charset names are case-insensitive, and the [IANA Character Sets registry](https://www.iana.org/assignments/character-sets/character-sets.xhtml) lists over 250 of them, which is why senders still emit oddities like `charset="ansi_x3.4-1968"` instead of plain US-ASCII.

```bash
# Show parsed headers, including Content-Type and its charset parameter
nylas email read msg_abc123 --headers

# Dump the raw RFC822 message and isolate every MIME-part declaration
nylas email read msg_abc123 --mime | grep -iA1 'content-type'

# Typical output for a healthy message:
# Content-Type: text/plain; charset="UTF-8"
# Content-Transfer-Encoding: quoted-printable

# Red flag: charset="us-ascii" on a body that visibly contains accents
```

## Why does é turn into Ã© in email bodies?

The sequence Ã© appears when the 2-byte UTF-8 encoding of é (`C3 A9`) is decoded as two separate ISO-8859-1 characters: `C3` renders as Ã and `A9` renders as ©. If a system then re-encodes that mistake to UTF-8, the character becomes 4 bytes: double-encoded UTF-8. The classic culprit is a database column declared as latin1 storing UTF-8 bytes, with an export job converting to UTF-8 a second time on the way out.

Here is the byte test promised in the TL;DR. Pipe the body through `xxd`: 2 bytes (`C3 A9`) means the stored data is fine and only the viewer's charset is wrong. 4 bytes (`C3 83 C2 A9`) means the data itself was corrupted at write time and needs one reverse conversion with `iconv`.

```bash
# Inspect the actual bytes behind the garbled character
nylas email read msg_abc123 --json | jq -r '.body' | xxd | head -20
# C3 A9        -> clean UTF-8 e-acute; fix the viewer, not the data
# C3 83 C2 A9  -> double-encoded UTF-8; the data itself is corrupted

# Reverse double encoding: decode UTF-8, write the bytes back as Latin-1,
# leaving the original single-encoded UTF-8
nylas email read msg_abc123 --json | jq -r '.body' | iconv -f UTF-8 -t ISO-8859-1

# Verify the repair round-trips: e-acute should now display correctly
nylas email read msg_abc123 --json | jq -r '.body' | iconv -f UTF-8 -t ISO-8859-1 | xxd | head -5
```

## How do I fix quoted-printable artifacts like =C3=A9?

Visible `=C3=A9` sequences mean a quoted-printable body was never decoded. RFC 2045 defines quoted-printable as `=XX` hex escapes with a 76-character line limit; each non-ASCII byte triples in size. Decode the escapes back to bytes, then decode those bytes as UTF-8. A lone equals sign at the end of a line is a soft line break from the same scheme, and stray ones in visible text point at the identical undecoded-transfer-encoding bug.

Subject lines use a related scheme: [RFC 2047](https://datatracker.ietf.org/doc/html/rfc2047) encoded-words like `=?UTF-8?Q?caf=C3=A9?=`, each capped at 75 characters. The CLI's `--json` output returns provider-decoded fields, so artifacts there point at an upstream sender bug rather than your tooling. Python's bundled `quopri` module decodes stray escapes in one line.

```bash
# Decode a quoted-printable artifact manually
printf 'caf=C3=A9' | python3 -c "import quopri,sys; sys.stdout.buffer.write(quopri.decodestring(sys.stdin.buffer.read()))"
# Output: café

# Check whether the subject still carries undecoded artifacts
nylas email read msg_abc123 --json | jq -r '.subject'

# Find other messages from the same broken sender pipeline ("*" matches any subject)
nylas email search "*" --from sender@example.com --limit 20 --json | jq -r '.[].subject'
```

## How do I prevent garbled text when sending email from the terminal?

Prevention means never hand-writing MIME headers. The `nylas email send` command submits your text through the Nylas API, which builds the `Content-Type` and transfer encoding for you, so the declared charset always matches the bytes. Most mojibake ships from scripts that concatenate raw SMTP headers themselves.

Header support for raw UTF-8 only arrived with [RFC 6532](https://datatracker.ietf.org/doc/html/rfc6532) in February 2012, and per the [Unicode Consortium's UTF/BOM FAQ](https://www.unicode.org/faq/utf_bom.html), a UTF-8 byte order mark is permitted but not required — another classic source of stray `EF BB BF` bytes in piped input. Sending accented text takes one command, and reading the message back confirms the headers in about 5 seconds.

```bash
# Send accented UTF-8 text; the API declares the matching charset
nylas email send --to user@example.com \
  --subject "Café réservation confirmée" \
  --body "Votre table est prête à 19h30." \
  --yes

# Read the sent message back and confirm the declared charset matches
nylas email search "Café réservation" --limit 1 --json | jq -r '.[0].id'
nylas email read <that-id> --headers

# If body text comes from a file, strip a UTF-8 BOM as you send
nylas email send --to user@example.com --subject "Report" \
  --body "$(sed '1s/^\xEF\xBB\xBF//' body.txt)" --yes
```

Before blaming the provider, work through the chain in order: terminal locale, declared charset, actual bytes, transfer encoding. A terminal set to `LANG=C` garbles perfectly valid UTF-8 on display, and no amount of header fixing changes that. This workflow was verified on Nylas CLI 3.1.17 against Gmail and Outlook grants on June 9, 2026; other providers' charset handling is described from the RFCs above, so verify locally before automating fixes. The 4-step checklist takes under 2 minutes per message:

1. Run `locale` — confirm the terminal uses a UTF-8 locale before trusting what you see.
2. Run `nylas email read <id> --headers` and note the declared charset and transfer encoding.
3. Pipe the body through `xxd` and apply the 2-byte vs 4-byte test for double encoding.
4. Only then convert: `iconv` for charset repair, `quopri` for stray escapes.

---

## Next steps

- [Debug invisible characters in email](https://cli.nylas.com/guides/debugging-invisible-characters-email) — zero-width spaces, BOMs, and byte-level inspection with xxd
- [Analyze email headers from the CLI](https://cli.nylas.com/guides/analyze-email-headers-cli) — authentication, routing, and MIME headers explained
- [Extract email data with jq](https://cli.nylas.com/guides/extract-email-data-jq) — pull subjects, bodies, and headers out of JSON output
- [EmailEngine vs Nylas](https://cli.nylas.com/guides/emailengine-vs-nylas) — how self-hosted IMAP middleware compares on message parsing
- [Twilio SendGrid vs Nylas](https://cli.nylas.com/guides/twilio-vs-nylas) — transactional sending stacks and where encoding bugs creep in
- [Full command reference](https://cli.nylas.com/docs/commands) — every flag and subcommand documented
- [RFC 2045 (MIME Part One)](https://datatracker.ietf.org/doc/html/rfc2045) — Content-Type, charset parameters, and quoted-printable
- [RFC 2047 (MIME Part Three)](https://datatracker.ietf.org/doc/html/rfc2047) — encoded-words for non-ASCII headers
- [RFC 6532 (Internationalized Email Headers)](https://datatracker.ietf.org/doc/html/rfc6532) — native UTF-8 in header fields
- [Unicode UTF & BOM FAQ](https://www.unicode.org/faq/utf_bom.html) — byte order mark rules for UTF-8
