Fix Python – Good Python modules for fuzzy string comparison? [closed]

Question

Asked By – Soviut

I’m looking for a Python module that can do simple fuzzy string comparisons. Specifically, I’d like a percentage of how similar the strings are. I know this is potentially subjective so I was hoping to find a library that can do positional comparisons as well as longest similar string matches, among other things.

Basically, I’m hoping to find something that is simple enough to yield a single percentage while still configurable enough that I can specify what type of comparison(s) to do.

Now we will see solution for issue: Good Python modules for fuzzy string comparison? [closed]


Answer

Levenshtein Python extension and C library.

https://github.com/ztane/python-Levenshtein/

The Levenshtein Python C extension module contains functions for fast
computation of
– Levenshtein (edit) distance, and edit operations
– string similarity
– approximate median strings, and generally string averaging
– string sequence and set similarity
It supports both normal and Unicode strings.

$ pip install python-levenshtein
...
$ python
>>> import Levenshtein
>>> help(Levenshtein.ratio)
ratio(...)
    Compute similarity of two strings.

    ratio(string1, string2)

    The similarity is a number between 0 and 1, it's usually equal or
    somewhat higher than difflib.SequenceMatcher.ratio(), becuase it's
    based on real minimal edit distance.

    Examples:
    >>> ratio('Hello world!', 'Holly grail!')
    0.58333333333333337
    >>> ratio('Brian', 'Jesus')
    0.0

>>> help(Levenshtein.distance)
distance(...)
    Compute absolute Levenshtein distance of two strings.

    distance(string1, string2)

    Examples (it's hard to spell Levenshtein correctly):
    >>> distance('Levenshtein', 'Lenvinsten')
    4
    >>> distance('Levenshtein', 'Levensthein')
    2
    >>> distance('Levenshtein', 'Levenshten')
    1
    >>> distance('Levenshtein', 'Levenshtein')
    0

This question is answered By – Pete Skomoroch

This answer is collected from stackoverflow and reviewed by FixPython community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0