openclean.operator.split.split module
Functions and classes that implement the split operator in openclean.
- class openclean.operator.split.split.Split(predicate: openclean.function.eval.base.EvalFunction)
Bases:
openclean.operator.base.DataFrameSplitter
Data frame splitter that evaluates a Boolean predicate on the rows of a data frame. The output has two data frames, one containing the rows for which the predicate was satisfied and one containing the rows for which the predicate was not satisfied.
- split(df: pandas.core.frame.DataFrame) Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]
Split the data frame into two data frames. The output is a tuple. The first element is the data frame that contains all rows for which the predicate evaluated to True. The second element is a data frame containing the rows for which the predicate was False.
- Parameters
df (pandas.DataFrame) – Input data frame.
- Return type
pandas.DataFrame, pandas.DataFrame
- openclean.operator.split.split.split(df: pandas.core.frame.DataFrame, predicate: openclean.function.eval.base.EvalFunction) Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]
Split function for data frames. Evaluates a Boolean predicate on the rows of a given data frame. The output comprises two data frames. The first data frame contains the rows for which the predicate was satisfied and the second contains the rows for which the predicate was not satisfied.
- Parameters
df (pandas.DataFrame) – Input data frame.
predicate (openclean.function.eval.base.EvalFunction) – Evaluation function that is evaluated on each data frame row. The resulting value determines for each row in which of the two output data frames it will be placed.
- Return type
pandas.DataFrame, pandas.DataFrame