How to Fix Japanese Mojibake (文字化け) in PDFs
A complete diagnostic and repair guide for garbled Japanese characters in PDF files, covering font embedding, encoding issues, and OCR rescue.
Use This Tool Now
Drop your garbled PDF below to convert it to Word or run OCR — the most reliable way to recover readable Japanese text.
Fix Mojibake NowStep-by-Step Guide
- 1
Diagnose what kind of mojibake you have
Open the PDF and try to copy the garbled text into a text editor. If it pastes as recognizable Japanese, the problem is only in the viewer — update your PDF reader. If it pastes as boxes, question marks, or random Latin letters, the PDF itself has a font embedding or encoding problem and needs to be repaired.
- 2
Try re-exporting the PDF through a converter
Run the PDF through PDFMint's PDF to Word converter. Because the converter rebuilds the text layer from scratch using a CJK-aware pipeline, it often fixes broken ToUnicode mappings that cause copy-paste mojibake. Once the Word file looks correct, re-export it back to PDF with embedded fonts to get a clean, searchable PDF.
- 3
If text is missing entirely, run OCR
If the PDF is actually a scan with no text layer, no amount of re-encoding will recover the characters — they simply are not in the file. Run PDFMint's OCR tool with Japanese (jpn) selected as the language to extract characters from the image. Modern OCR engines reach 95%+ accuracy on clean printed Japanese.
Tips
- When creating a Japanese PDF from Word or LibreOffice, always tick "Embed fonts in the file" before exporting. Non-embedded MS Mincho or Noto Sans CJK references break on machines that lack the font.
- If you need to send a PDF with rare kanji or older Japanese characters, use Source Han Serif / Noto Serif CJK as the source font — it has the widest Unicode coverage of any free CJK font family.
- Vertical (tategaki) Japanese text is especially fragile in PDF exports. Test the result by copy-pasting a sentence back into a text editor before sending the file.
Frequently Asked Questions
Why does my PDF show Japanese as boxes (□) or tofu?
Tofu boxes mean your PDF reader does not have access to the font referenced by the file. This usually happens when the original document was exported without embedding the Japanese font — typical with old MS Office for Mac exports and some Linux tools. The fix is to re-export from the source with "Embed all fonts" enabled, or to round-trip the PDF through a converter that rebuilds the text layer.
Why does copy-paste give me random Latin characters instead of kanji?
The PDF is using a custom font subset whose ToUnicode CMap is broken or missing. The glyphs render correctly because the visual shapes are embedded, but the underlying character codes do not map to real Unicode codepoints. Running the file through PDF to Word or OCR rebuilds the text layer from the visible glyphs, recovering the original characters.
Will OCR make a Japanese PDF searchable?
Yes. OCR generates a hidden text layer underneath the existing image, which makes the PDF searchable in any reader and allows copy-paste to return real Japanese characters. Choose Japanese (jpn) as the OCR language, or jpn+eng for documents that mix the two scripts. Higher-resolution scans (300 DPI or more) yield substantially better accuracy than 150 DPI scans, especially for densely-printed kanji.
Related Tools
Ready to get started?
No sign-up required. Your files never leave your device.
Fix Mojibake Now →