openclean.operator.split.split module

Functions and classes that implement the split operator in openclean.

class openclean.operator.split.split.Split(predicate: openclean.function.eval.base.EvalFunction)

Bases: openclean.operator.base.DataFrameSplitter

Data frame splitter that evaluates a Boolean predicate on the rows of a data frame. The output has two data frames, one containing the rows for which the predicate was satisfied and one containing the rows for which the predicate was not satisfied.

split(df: pandas.core.frame.DataFrame) Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Split the data frame into two data frames. The output is a tuple. The first element is the data frame that contains all rows for which the predicate evaluated to True. The second element is a data frame containing the rows for which the predicate was False.

Parameters

df (pandas.DataFrame) – Input data frame.

Return type

pandas.DataFrame, pandas.DataFrame

openclean.operator.split.split.split(df: pandas.core.frame.DataFrame, predicate: openclean.function.eval.base.EvalFunction) Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Split function for data frames. Evaluates a Boolean predicate on the rows of a given data frame. The output comprises two data frames. The first data frame contains the rows for which the predicate was satisfied and the second contains the rows for which the predicate was not satisfied.

Parameters
  • df (pandas.DataFrame) – Input data frame.

  • predicate (openclean.function.eval.base.EvalFunction) – Evaluation function that is evaluated on each data frame row. The resulting value determines for each row in which of the two output data frames it will be placed.

Return type

pandas.DataFrame, pandas.DataFrame