openclean.operator.collector.aggregate module

Class that implements the DataframeMapper abstract class to perform groupby operations on a pandas dataframe.

class openclean.operator.collector.aggregate.Aggregate(func: Union[Dict[str, Callable], Callable], schema: Optional[List[str]] = None)

Bases: openclean.operator.base.DataGroupReducer

Aggregate class that takes in a DataFrameGrouping and aggregate function(s), aggregates them and returns a dataframe

reduce(groups)

Reduces the groups using the agg functions and returns a dataframe

Parameters

groups (DataFrameGrouping) – grouping object returned by some groupby operation

Return type

pd.DataFrame

Raises
  • KeyError: – if the input column isn’t found

  • Type Error: – if the provided schema is invalid

openclean.operator.collector.aggregate.aggregate(groups: openclean.data.groupby.DataFrameGrouping, func: Union[Dict[str, Callable], Callable], schema: Optional[List[str]] = None)

Aggregate helper function that takes the DataFrameGouping, a schema and a function(s) and returns a dataframe created from the groupings using the functions following that schema

Parameters
  • groups (DataFrameGrouping) – object returned from a GroupBy operation

  • schema (list of string, optional) – list of column names

  • func

    callable,

    dict of str:callables

    ) If a single callable provided, it must handle the each dataframe group to create an aggregate value If a dict of str:callables provided, the keys are column names and the values are aggregate functions

    for each of those columns

Return type

pd.DataFrame

openclean.operator.collector.aggregate.get_agg_funcs(func)

Helper method used to create a mapping of the aggregation functions with their columns.

Parameters

functions (dict of str:Callable or Callable) – Single Callable that aggregates on the entire df or a dict of callables where the keys are column names and the values are the functions for the respective column

Return type

dict

openclean.operator.collector.aggregate.is_single_or_dict(Y)