openclean.function.similarity.base module
Base classes for similarity functions and similarity constraints.
- class openclean.function.similarity.base.SimilarityConstraint(func: openclean.function.similarity.base.SimilarityFunction, pred: Callable)
Bases:
object
Function that validates a constraint, e.g., a threshold predicate, on the similarity between two values (scalar or tuples).
This class is a simple wrapper around a similarity function and a predicate that is evaluated on the similarity score for a given pair of values.
- is_satisfied(val_1: Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]], val_2: Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]) bool
Test if a given pair of values satisfies the similarity constraint.
Returns True if the similarity between
val_1`
andval_2
satisfies the constraint (e.g., a given trheshold).- Parameters
val_1 (scalar or tuple) –
val_2 (scalar or tuple) –
- Return type
bool
- class openclean.function.similarity.base.SimilarityFunction
Bases:
object
Mixin class for functions that compute the similarity between two values (scalar or tuples). Primarily useful for string similarity.
Similarity results are float values in the interval [0-1] where 0 is the minimal similarity between two values and 1 is the maximal similarity.
- abstract sim(val_1: Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]], val_2: Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]) float
Compute similarity between between two values.
The result is in the interval [0-1] where 0 is the minimal similarity between two values and 1 is the maximal similarity.
- Parameters
val_1 (scalar or tuple) –
val_2 (scalar or tuple) –
- Return type
float