openclean.operator.collector.aggregate module
Class that implements the DataframeMapper abstract class to perform groupby operations on a pandas dataframe.
- class openclean.operator.collector.aggregate.Aggregate(func: Union[Dict[str, Callable], Callable], schema: Optional[List[str]] = None)
Bases:
openclean.operator.base.DataGroupReducer
Aggregate class that takes in a DataFrameGrouping and aggregate function(s), aggregates them and returns a dataframe
- reduce(groups)
Reduces the groups using the agg functions and returns a dataframe
- Parameters
groups (DataFrameGrouping) – grouping object returned by some groupby operation
- Return type
pd.DataFrame
- Raises
KeyError: – if the input column isn’t found
Type Error: – if the provided schema is invalid
- openclean.operator.collector.aggregate.aggregate(groups: openclean.data.groupby.DataFrameGrouping, func: Union[Dict[str, Callable], Callable], schema: Optional[List[str]] = None)
Aggregate helper function that takes the DataFrameGouping, a schema and a function(s) and returns a dataframe created from the groupings using the functions following that schema
- Parameters
groups (DataFrameGrouping) – object returned from a GroupBy operation
schema (list of string, optional) – list of column names
func –
- callable,
dict of str:callables
) If a single callable provided, it must handle the each dataframe group to create an aggregate value If a dict of str:callables provided, the keys are column names and the values are aggregate functions
for each of those columns
- Return type
pd.DataFrame
- openclean.operator.collector.aggregate.get_agg_funcs(func)
Helper method used to create a mapping of the aggregation functions with their columns.
- Parameters
functions (dict of str:Callable or Callable) – Single Callable that aggregates on the entire df or a dict of callables where the keys are column names and the values are the functions for the respective column
- Return type
dict
- openclean.operator.collector.aggregate.is_single_or_dict(Y)