Question
Asked By – grieve
How can I tell if a file is binary (non-text) in Python?
I am searching through a large set of files in Python, and keep getting matches in binary files. This makes the output look incredibly messy.
I know I could use grep -I
, but I am doing more with the data than what grep allows for.
In the past, I would have just searched for characters greater than 0x7f
, but utf8
and the like, make that impossible on modern systems. Ideally, the solution would be fast.
Now we will see solution for issue: How can I detect if a file is binary (non-text) in Python?
Answer
You can also use the mimetypes module:
import mimetypes
...
mime = mimetypes.guess_type(file)
It’s fairly easy to compile a list of binary mime types. For example Apache distributes with a mime.types file that you could parse into a set of lists, binary and text and then check to see if the mime is in your text or binary list.
This question is answered By – Gavin M. Roy
This answer is collected from stackoverflow and reviewed by FixPython community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0