openclean.profiling.classifier.base module

Generic classifier that can be used as a profiling function.

class openclean.profiling.classifier.base.Classifier(classifier: openclean.function.value.classifier.ValueClassifier, normalizer: Optional[Union[Callable, openclean.function.value.base.ValueFunction]] = None, features: Optional[openclean.profiling.classifier.base.ResultFeatures] = None, labels: Optional[Union[List[str], Tuple[str, str]]] = None)

Bases: openclean.profiling.base.DataStreamProfiler

The classifier wraps a ValueClassifier with functionality that allows it to be used as a profiling function.

close() Dict

Convert the total and distinct counts for class labels into the requested format. The result is a dictionary. The elements in the dictionary depend on the features that were requested (at object construction) and whether a normaizer was given or not.

Return type

dict

consume(value: Union[int, float, str, datetime.datetime], count: int)

Consume a pair of (value, count) in the data stream. Collects all values in a counter dictionary.

Parameters
  • value (scalar) – Scalar column value from a dataset that is part of the data stream that is being profiled.

  • count (int) – Frequency of the value. Note that this count only relates to the given value and does not necessarily represent the total number of occurrences of the value in the stream.

open()

Initialize the counter for class label frequencies at the beginning of the stream.

class openclean.profiling.classifier.base.ResultFeatures(value)

Bases: enum.Enum

Enumarate accepted values for the datatype features argument.

BOTH = 2
DISTINCT = 0
TOTAL = 1