openclean.profiling.stats module
Collection of statistics helper functions anc classes for profiling.
- class openclean.profiling.stats.MinMaxCollector(first_value: Optional[Union[int, float, str, datetime.datetime]] = None, minmax: Optional[Tuple[Union[int, float, str, datetime.datetime], Union[int, float, str, datetime.datetime]]] = None)
Bases:
dict
Consumer that identifies the minimum and maximum value over a stream of data. The class extends a dictionary for integration into profiling result dictionaries.
- consume(value: Union[int, float, str, datetime.datetime])
Consume a value in the data stream and adjust the minimum and maximum if necessary.
- Parameters
value (scalar) – Value in the data stream.
- property maximum
Get the current maximum over all consumed values.
- Return type
scalar
- property minimum
Get the current minimum over all consumed values.
- Return type
scalar
- openclean.profiling.stats.entropy(values: collections.Counter, default: Optional[float] = None) float
Compute the entropy for a given set of distinct values and their frequency counts.
Returns the default value if the given counter is empty.
- Parameters
values (collections.Counter) – Counter with frequencies for a set of distinct values.
- Return type
float