Fix “Invalid UTF-8 Byte Sequence” Errors in CSV Imports

By CSV Editor Team · Last updated: 2026-03-16

An invalid UTF-8 byte sequence error means the file contains text bytes that cannot be read as valid UTF-8. In practice, that usually happens because the CSV was exported in another encoding such as Windows-1252, ISO-8859-1, or Shift-JIS, then treated like UTF-8 later in the workflow. The safe fix is to identify the original encoding, reopen or re-export the file correctly, and test the repaired CSV before import.

Quick answer

  1. Keep the original CSV untouched as a backup.
  2. Confirm the problem is encoding, not a wrong delimiter or broken quotes.
  3. Reopen the source with the correct encoding or export a fresh UTF-8 version from the source system.
  4. Verify sample rows with accents, symbols, and non-English text.
  5. Test import the repaired UTF-8 file before replacing the old one.

What this error usually looks like

  • Import tools reject the file with wording like invalid byte sequence in UTF-8.
  • Characters show up as , é, ’, or similar artifacts.
  • Only certain rows fail because one bad character breaks validation.
  • The file opens in one app but fails in another stricter importer.

Invalid UTF-8 vs ordinary garbled characters

Some files merely look wrong because the viewer picked the wrong encoding. Others actually contain byte sequences that are not valid UTF-8 at all. The second case is what triggers strict importer errors. That is why a file may display somewhat normally in a spreadsheet but still fail in a backend validation step.

If your file mostly displays junk text but still opens, also review why CSV shows garbled characters and how UTF-8 encoding works in CSV.

Common causes

  • A legacy system exported ANSI, Windows-1252, ISO-8859-1, or another non-UTF-8 encoding.
  • A spreadsheet saved the file with regional defaults instead of UTF-8.
  • Multiple export steps re-saved the file in inconsistent encodings.
  • One or more characters were already corrupted upstream and no longer map back cleanly.
  • A BOM or mixed-encoding workflow confused the destination importer.

Step-by-step: safe repair workflow

  1. Back up the original file. If later edits make the data worse, you need the untouched source.
  2. Check structure first. If the file opens in one column or rows are shifting, solve delimiter or quote problems separately.
  3. Find known-good sample text. Use names, cities, apostrophes, currency symbols, or product titles you can recognize easily.
  4. Reopen using the suspected source encoding. If the file came from Excel on Windows, a legacy ERP, or an old desktop app, try the encoding that system usually exports.
  5. Export a fresh UTF-8 CSV. Keep the same delimiter unless the destination explicitly requires another one.
  6. Verify parsed output. Confirm rows, columns, quoting, and special characters all survive the conversion.
  7. Run a small test import. Do not replace the production file until the destination accepts the repaired version cleanly.

Example of the problem

A healthy UTF-8 CSV row might contain values like these:

name,city,note
José Álvarez,München,"Paid in €"

If the encoding is mishandled, the same row can turn into damaged text or trigger byte-sequence validation errors during import. Fixing the source encoding is safer than manually replacing visible junk characters.

Mistakes to avoid

  • Running find-and-replace on visible garbage text without fixing the real encoding.
  • Changing delimiter, quoting, and encoding all at once, which makes diagnosis harder.
  • Saving repeatedly in spreadsheet tools that auto-convert values like dates and IDs.
  • Assuming BOM always solves the problem when the destination never requested it.
  • Overwriting the only source copy before validation passes.

Quick QA checklist

  • Rows and columns still align correctly
  • Accented letters and punctuation display correctly
  • Quoted fields still parse as one cell
  • Delimiter matches the destination system
  • Test import succeeds without UTF-8 errors

FAQ

Can invalid UTF-8 mean my data is permanently lost?

Not always. If the original export still exists, reopening it with the correct source encoding often restores the text. Permanent loss is more likely only after repeated bad save cycles.

Should I export UTF-8 with BOM or without BOM?

Default to UTF-8 unless the importer specifically asks for BOM. Some spreadsheet-heavy workflows prefer BOM, but many web apps and APIs work best with plain UTF-8.

Does an invalid UTF-8 error always affect the whole file?

No. Sometimes one broken character in one row is enough to make a strict importer reject the entire CSV.

Related guides

Canonical: https://csveditoronline.com/docs/fix-invalid-byte-sequence-utf8-csv