openclean.profiling.classifier.base module
Generic classifier that can be used as a profiling function.
- class openclean.profiling.classifier.base.Classifier(classifier: openclean.function.value.classifier.ValueClassifier, normalizer: Optional[Union[Callable, openclean.function.value.base.ValueFunction]] = None, features: Optional[openclean.profiling.classifier.base.ResultFeatures] = None, labels: Optional[Union[List[str], Tuple[str, str]]] = None)
Bases:
openclean.profiling.base.DataStreamProfiler
The classifier wraps a ValueClassifier with functionality that allows it to be used as a profiling function.
- close() Dict
Convert the total and distinct counts for class labels into the requested format. The result is a dictionary. The elements in the dictionary depend on the features that were requested (at object construction) and whether a normaizer was given or not.
- Return type
dict
- consume(value: Union[int, float, str, datetime.datetime], count: int)
Consume a pair of (value, count) in the data stream. Collects all values in a counter dictionary.
- Parameters
value (scalar) – Scalar column value from a dataset that is part of the data stream that is being profiled.
count (int) – Frequency of the value. Note that this count only relates to the given value and does not necessarily represent the total number of occurrences of the value in the stream.
- open()
Initialize the counter for class label frequencies at the beginning of the stream.