3 hours ago · Tech · 0 comments

Symptom cat displays text, but grep can't find it: % cat foo.txt The world is overcome--aye! even here! By such as fix their faith on Unity. % grep fix foo.txt % Cause The file is UTF-16, not UTF-8/ASCII. For example, in UTF-16LE, fix is encoded as: 66 00 69 00 78 00 not as the contiguous ASCII bytes, which are also the UTF-8 bytes for fix: 66 69 78 So grep fix does not match. cat may still look readable because NUL bytes are often not visibly rendered. Detect encoding % file foo.txt foo.txt: Little-endian UTF-16 Unicode text If the file has no byte-order mark (BOM), file may fail to identify it: % file foo.txt foo.txt: data For a file expected to be text, data is a clue to inspect the bytes: % xxd -g 1 -l 64 foo.txt UTF-16LE has alternating character/NUL bytes: 54 00 68 00 65 00 20 00 UTF-16BE has the reverse pattern: 00 54 00 68 00 65 00 20 Solution Convert to UTF-8 before grepping: % iconv -f UTF-16LE -t UTF-8 foo.txt | grep fix By such as fix their faith on Unity. Use UTF-16BE…

No comments yet. Log in to reply on the Fediverse. Comments will appear here.