openclean.profiling.datatype.convert module

Datatype converter for profiling purposes. The converter returns a converted value together with the type label for datatype counting and (min,max) computations.

class openclean.profiling.datatype.convert.DatatypeConverter(datatypes: List[Tuple[Type, str, Callable]], default_label: str)

Bases: object

Converter for scalar values that is used for profiling purposes. The converter maintains a list of datatype converters (callables) each of which is assigned to a raw type and a type label.

Converters and their associated label are used to represent the associated raw data types. That is, converters are expected to return None for values that cannot be converted to the respective raw data type.

The datatype converter returns the converted value and type label for the first converter that represents a raw data type to wich the given value could be converted, i.e., that accepted the given value. If a value has a raw type that matches the raw type of one of the converters the value itself and the label for that respective converter is returned (Issue #45).

Note that the raw type for a given converter can be None. In this case we ignore this converter when checking whether the raw type of an input value matches the represented type of the converter, The converter may still be used in case the raw type of an input value does not match the raw types of any of the other converters and we attempt to cast the value.

If no converter accepted the original value and a default type label is returned.

cast(value: Union[int, float, str, datetime.datetime]) Union[int, float, str, datetime.datetime]

Convert a given value. Returns the resulting value without the type label.

Parameters

value (scalar) – Value that is converted for data type detection.

Return type

scalar

convert(value: Union[int, float, str, datetime.datetime]) Tuple[Union[int, float, str, datetime.datetime], str]

Convert a given value. Returns a tuple of converted value and type label.

Parameters

value (scalar) – Value that is converted for data type detection.

Return type

tuple of scalar and string (type label)

openclean.profiling.datatype.convert.DefaultConverter() openclean.profiling.datatype.convert.DatatypeConverter

Get an instance of the default data type converter for data profiling. the default converter distinguishes between integer, float, datetime, and text.

Return type

openclean.profiling.datatype.convert.DatatypeConverter