openclean.profiling.datatype.operator module

Datatype conversion consumer and processor for data pipelines.

class openclean.profiling.datatype.operator.Typecast(converter: Optional[openclean.profiling.datatype.convert.DatatypeConverter] = None, columns: Optional[List[Union[str, histore.document.schema.Column]]] = None, consumer: Optional[openclean.operator.stream.consumer.StreamConsumer] = None)

Bases: openclean.operator.stream.consumer.ProducingConsumer, openclean.operator.stream.processor.StreamProcessor

Consumer for rows that casts all values in a row using a given type converter.

handle(rowid: int, row: List[Union[int, float, str, datetime.datetime]]) List[Union[int, float, str, datetime.datetime]]

Convert all values in the given row to a datatype that is defined by the associated converter.

Parameters
  • rowid (int) – Unique row identifier

  • row (list) – List of values in the row.

Return type

list

open(schema: List[Union[str, histore.document.schema.Column]]) openclean.operator.stream.consumer.StreamConsumer

Factory pattern for stream consumer. Returns an instance of the stream consumer that does the type casting for all data frame rows.

Parameters

schema (list of string) – List of column names in the data stream schema.

Return type

openclean.operator.stream.consumer.StreamConsumer