openclean.profiling.datatype.operator module
Datatype conversion consumer and processor for data pipelines.
- class openclean.profiling.datatype.operator.Typecast(converter: Optional[openclean.profiling.datatype.convert.DatatypeConverter] = None, columns: Optional[List[Union[str, histore.document.schema.Column]]] = None, consumer: Optional[openclean.operator.stream.consumer.StreamConsumer] = None)
Bases:
openclean.operator.stream.consumer.ProducingConsumer
,openclean.operator.stream.processor.StreamProcessor
Consumer for rows that casts all values in a row using a given type converter.
- handle(rowid: int, row: List[Union[int, float, str, datetime.datetime]]) List[Union[int, float, str, datetime.datetime]]
Convert all values in the given row to a datatype that is defined by the associated converter.
- Parameters
rowid (int) – Unique row identifier
row (list) – List of values in the row.
- Return type
list
- open(schema: List[Union[str, histore.document.schema.Column]]) openclean.operator.stream.consumer.StreamConsumer
Factory pattern for stream consumer. Returns an instance of the stream consumer that does the type casting for all data frame rows.
- Parameters
schema (list of string) – List of column names in the data stream schema.
- Return type