Asked By – Soviut
I’m looking for a Python module that can do simple fuzzy string comparisons. Specifically, I’d like a percentage of how similar the strings are. I know this is potentially subjective so I was hoping to find a library that can do positional comparisons as well as longest similar string matches, among other things.
Basically, I’m hoping to find something that is simple enough to yield a single percentage while still configurable enough that I can specify what type of comparison(s) to do.
Now we will see solution for issue: Good Python modules for fuzzy string comparison? [closed]
Levenshtein Python extension and C library.
The Levenshtein Python C extension module contains functions for fast
– Levenshtein (edit) distance, and edit operations
– string similarity
– approximate median strings, and generally string averaging
– string sequence and set similarity
It supports both normal and Unicode strings.
$ pip install python-levenshtein ... $ python >>> import Levenshtein >>> help(Levenshtein.ratio) ratio(...) Compute similarity of two strings. ratio(string1, string2) The similarity is a number between 0 and 1, it's usually equal or somewhat higher than difflib.SequenceMatcher.ratio(), becuase it's based on real minimal edit distance. Examples: >>> ratio('Hello world!', 'Holly grail!') 0.58333333333333337 >>> ratio('Brian', 'Jesus') 0.0 >>> help(Levenshtein.distance) distance(...) Compute absolute Levenshtein distance of two strings. distance(string1, string2) Examples (it's hard to spell Levenshtein correctly): >>> distance('Levenshtein', 'Lenvinsten') 4 >>> distance('Levenshtein', 'Levensthein') 2 >>> distance('Levenshtein', 'Levenshten') 1 >>> distance('Levenshtein', 'Levenshtein') 0