openclean.operator.transform.filter module
Functions and classes that implement the filter operators in openclean.
- class openclean.operator.transform.filter.Filter(predicate: openclean.function.eval.base.EvalFunction, negated: Optional[bool] = False)
Bases:
openclean.operator.stream.processor.StreamProcessor
,openclean.operator.base.DataFrameTransformer
Data frame transformer that evaluates a Boolean predicate on the rows of a data frame. The transformed output contains only those rows for which the predicate evaluated to True (or Flase if the negated flag is True).
- open(schema: List[Union[str, histore.document.schema.Column]]) openclean.operator.stream.consumer.StreamFunctionHandler
Factory pattern for stream consumer. Returns an instance of a stream consumer that filters rows in a data stream using an stream function representing the filter predicate.
- Parameters
schema (list of string) – List of column names in the data stream schema.
- Return type
- transform(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Return a data frame that contains only those rows from the given input data frame that satisfy the filter condition.
- Parameters
df (pd.DataFrame) – Input data frame.
- Return type
pd.DataFrame
- openclean.operator.transform.filter.delete(df: pandas.core.frame.DataFrame, predicate: openclean.function.eval.base.EvalFunction) pandas.core.frame.DataFrame
Delete rows in a data frame. The delete operator evaluates a given predicate on all rows in a data frame. It returns a new data frame where those rows that satisfied the predicate are deleted.
- Parameters
df (pd.DataFrame) – Input data frame.
predicate (openclean.function.eval.base.EvalFunction) – Evaluation function that is expected to return a Boolean value when evaluated on a data frame row. All rows in the input data frame that satisfy the predicate will be deleted.
- Return type
pd.DataFrame
- openclean.operator.transform.filter.filter(df: pandas.core.frame.DataFrame, predicate: openclean.function.eval.base.EvalFunction, negated: Optional[bool] = False) pandas.core.frame.DataFrame
Filter function for data frames. Returns a data frame that only contains the rows of the input data frame for which the given predicate evaluates to True.
- Parameters
df (pd.DataFrame) – Input data frame.
predicate (openclean.function.eval.base.EvalFunction) – Evaluation function that is expected to return a Boolean value when evaluated on a data frame row. Only those rows in the input data frame that satisfy the predicate will be included in the result.
negated (bool, default=False) – Negate the predicate value to get an inverted result.
- Return type
pandas.DataFrame
- Raises
ValueError –