openclean.engine.operator module

Stream operators are subclasses or wrappers for StreamProcessor. These operators can directly be applied on archive snapshots.

class openclean.engine.operator.StreamOp(func: openclean.operator.stream.processor.StreamProcessor, description: Optional[str] = None)

Bases: object

Wrapper for a stream processor and an optioanl snapshot description.

open(schema: List[Union[str, histore.document.schema.Column]]) openclean.engine.operator.StreamOperator

Factory pattern for stream consumer. Returns an instance of the stream consumer that corresponds to the action that is defined by the stream processor.

Parameters

schema (list of string) – List of column names in the data stream schema.

Return type

openclean.engine.operator.DatasetOperator

class openclean.engine.operator.StreamOperator(func: openclean.operator.stream.consumer.StreamFunctionHandler, description: Optional[str] = None)

Bases: histore.document.operator.DatasetOperator

Operator for processing rows in a data stream from a dataset archive snapshot.

handle(rowid: int, row: List[Union[int, float, str, datetime.datetime]]) List[Union[int, float, str, datetime.datetime]]

Evaluate the operator on the given row.

Returns the processed row. If the result is None this signals that the given row should not be part of the collected result.

Parameters
  • rowid (int) – Unique row identifier

  • row (list) – List of values in the row.

Return type

list