openclean.profiling.anomalies.conditional module

Generic conditional outlier detector. Identify values as outliers if they satisfy a given outlier predicate.

class openclean.profiling.anomalies.conditional.ConditionalOutliers(resultcls: typing.Optional[typing.Type] = <class 'list'>)

Bases: openclean.profiling.anomalies.base.AnomalyDetector

Detect outliers in a given value sequence by testing for each value whether they satisfy an implementation-specific outlier condition.

abstract outlier(value: Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]], count: int) Any

Implementation specific outlier condition. If the given value is classified as an outlier, the result is a dictionary object containing the outlier value and additional optional provenance information that was generated by the outlier detector. If the value is not an outlier, the result is None.

Parameters

value (scalar or tuple) – Value that is being classified as an outlier.

Return type

any

process(values: collections.Counter) List

Identify values in a given set of values that satisfy the outlier condition. This method is called if the outlier detector is part of a data profiler configuration. The result is a list containing either the oulier values or dictionaries containing the outlier value (associated with the key ‘value’) and additional information that the outlier detector provided as supporting evidence (associated with the key ‘metadata’).

Parameters

values (collections.Counter) – Set of distinct scalar values or tuples of scalar values that are mapped to their respective frequency count.

Return type

list