openclean.engine.operator module
Stream operators are subclasses or wrappers for StreamProcessor
.
These operators can directly be applied on archive snapshots.
- class openclean.engine.operator.StreamOp(func: openclean.operator.stream.processor.StreamProcessor, description: Optional[str] = None)
Bases:
object
Wrapper for a stream processor and an optioanl snapshot description.
- open(schema: List[Union[str, histore.document.schema.Column]]) openclean.engine.operator.StreamOperator
Factory pattern for stream consumer. Returns an instance of the stream consumer that corresponds to the action that is defined by the stream processor.
- Parameters
schema (list of string) – List of column names in the data stream schema.
- Return type
openclean.engine.operator.DatasetOperator
- class openclean.engine.operator.StreamOperator(func: openclean.operator.stream.consumer.StreamFunctionHandler, description: Optional[str] = None)
Bases:
histore.document.operator.DatasetOperator
Operator for processing rows in a data stream from a dataset archive snapshot.
- handle(rowid: int, row: List[Union[int, float, str, datetime.datetime]]) List[Union[int, float, str, datetime.datetime]]
Evaluate the operator on the given row.
Returns the processed row. If the result is None this signals that the given row should not be part of the collected result.
- Parameters
rowid (int) – Unique row identifier
row (list) – List of values in the row.
- Return type
list