openclean.profiling.anomalies.base module

Abstract base class for anomaly and outlier detection operators.

class openclean.profiling.anomalies.base.AnomalyDetector

Bases: openclean.profiling.base.DistinctSetProfiler

Interface for generic anomaly and outlier detectors. Each implementation should take a stream of distinct values (e.g., from a column in a data frame or a metadata object) as input and return a list of values that were identified as outliers.

find(values: Union[Iterable[Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]], collections.Counter]) List[Union[Dict, int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]]

Identify values in a given set of values that are classified as outliers or anomalities. Returns a list of identified values.

Parameters

values (iterable of values) – List of input values.

Return type

list