openclean.function.eval.aggregate module
Collection of evaluation functions that return a computed statistic over one or more data frame columns for all data frame rows.
- class openclean.function.eval.aggregate.Avg(columns: Union[int, str, histore.document.schema.Column, openclean.function.eval.base.EvalFunction, List[Union[int, str, histore.document.schema.Column, openclean.function.eval.base.EvalFunction]]])
Bases:
openclean.function.eval.base.Eval
Evaluation function that returns the mean of values for one or more columns in a data frame.
- class openclean.function.eval.aggregate.ColumnAggregator(func: Callable)
Bases:
openclean.function.value.base.ValueFunction
Value function that computes an aggregate over a list of values. The aggregated value is computed when the function is prepared. It then returns a constant value function that is initialized with the aggregation result, i.e., that will return the aggregation result for any input value.
- eval(value: Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]])
Raises an error. The column aggregator can only be used to prepare a constant value funciton.
- Parameters
value (scalar or tuple) – Value from the list that was used to prepare the function.
- Raises
NotImplementedError –
- is_prepared() bool
The column aggregator has to be prepared.
- Return type
bool
- prepare(values: List[Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]]) openclean.function.value.base.ConstantValue
Optional step to prepare the function for a given set of values. This step allows to compute additional statistics over the set of values.
While it is likely that the given set of values represents the values for which the eval() function will be called, this property is not guaranteed.
- Parameters
values (dict) – Set of distinct scalar values or tuples of scalar values that are mapped to their respective frequency count.
- Return type
- class openclean.function.eval.aggregate.Count(columns: Union[int, str, histore.document.schema.Column, openclean.function.eval.base.EvalFunction, List[Union[int, str, histore.document.schema.Column, openclean.function.eval.base.EvalFunction]]], value: Optional[Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]] = True)
Bases:
openclean.function.eval.base.Eval
Evaluation function that counts the number of values in one or more columns that match a given value.
- class openclean.function.eval.aggregate.Max(columns: Union[int, str, histore.document.schema.Column, openclean.function.eval.base.EvalFunction, List[Union[int, str, histore.document.schema.Column, openclean.function.eval.base.EvalFunction]]])
Bases:
openclean.function.eval.base.Eval
Evaluation function that returns the maximum of values for one or more columns in a data frame.
- class openclean.function.eval.aggregate.Min(columns: Union[int, str, histore.document.schema.Column, openclean.function.eval.base.EvalFunction, List[Union[int, str, histore.document.schema.Column, openclean.function.eval.base.EvalFunction]]])
Bases:
openclean.function.eval.base.Eval
Evaluation function that returns the minimum of values for one or more columns in a data frame.
- class openclean.function.eval.aggregate.Sum(columns: Union[int, str, histore.document.schema.Column, openclean.function.eval.base.EvalFunction, List[Union[int, str, histore.document.schema.Column, openclean.function.eval.base.EvalFunction]]])
Bases:
openclean.function.eval.base.Eval
Evaluation function that returns the sum over values for one or more columns in a data frame.