Question
Asked By – toobee
I am retrieving Twitter data with a Python tool and dump these in JSON format to my disk. I noticed an unintended escaping of the entire data-string for a tweet being enclosed in double quotes. Furthermore, all double quotes of the actual JSON formatting are escaped with a backslash.
They look like this:
“{\”created_at\”:\”Fri Aug 08 11:04:40 +0000
2014\”,\”id\”:497699913925292032,
How do I avoid that? It should be:
{“created_at”:”Fri Aug 08 11:04:40 +0000 2014″ …..
My file-out code looks like this:
with io.open('data'+self.timestamp+'.txt', 'a', encoding='utf-8') as f:
f.write(unicode(json.dumps(data, ensure_ascii=False)))
f.write(unicode('\n'))
The unintended escaping causes problems when reading in the JSON file in a later processing step.
Now we will see solution for issue: Dump to JSON adds additional double quotes and escaping of quotes
Answer
You are double encoding your JSON strings. data
is already a JSON string, and doesn’t need to be encoded again:
>>> import json
>>> not_encoded = {"created_at":"Fri Aug 08 11:04:40 +0000 2014"}
>>> encoded_data = json.dumps(not_encoded)
>>> print encoded_data
{"created_at": "Fri Aug 08 11:04:40 +0000 2014"}
>>> double_encode = json.dumps(encoded_data)
>>> print double_encode
"{\"created_at\": \"Fri Aug 08 11:04:40 +0000 2014\"}"
Just write these directly to your file:
with open('data{}.txt'.format(self.timestamp), 'a') as f:
f.write(data + '\n')
This question is answered By – Martijn Pieters
This answer is collected from stackoverflow and reviewed by FixPython community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0