Fix Python – How can I detect if a file is binary (non-text) in Python?

Question

Asked By – grieve

How can I tell if a file is binary (non-text) in Python?

I am searching through a large set of files in Python, and keep getting matches in binary files. This makes the output look incredibly messy.

I know I could use grep -I, but I am doing more with the data than what grep allows for.

In the past, I would have just searched for characters greater than 0x7f, but utf8 and the like, make that impossible on modern systems. Ideally, the solution would be fast.

Now we will see solution for issue: How can I detect if a file is binary (non-text) in Python?


Answer

You can also use the mimetypes module:

import mimetypes
...
mime = mimetypes.guess_type(file)

It’s fairly easy to compile a list of binary mime types. For example Apache distributes with a mime.types file that you could parse into a set of lists, binary and text and then check to see if the mime is in your text or binary list.

This question is answered By – Gavin M. Roy

This answer is collected from stackoverflow and reviewed by FixPython community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0