openclean.operator.stream.processor module

Operators in a data stream pipeline represent the actors in the definition of the pipeline. Each operator provides the functionality (factory pattern) to instantiate the operator for a given data stream before a data streaming pipeline is executed.

class openclean.operator.stream.processor.StreamProcessor

Bases: object

Stream processors represent definitions of actors in a data stream processing pipeline. Each operator implements the open method to create a prepared instance (consumer) for the operator that is used in a stream processing pipeline to filter, manipulate or profile data stream rows.

abstract open(schema: List[Union[str, histore.document.schema.Column]]) openclean.operator.stream.consumer.StreamConsumer

Factory pattern for stream consumer. Returns an instance of the stream consumer that corresponds to the action that is defined by the stream processor.

Parameters

schema (list of string) – List of column names in the data stream schema.

Return type

openclean.operator.stream.consumer.StreamConsumer