openclean.function.value.datatype module

Collection of unary functions for checking and converting the data type of given scalar values. Includes functions for type case and predicates for type checking.

class openclean.function.value.datatype.Datetime(label: Optional[str] = 'datetime', formats: Optional[Union[str, List[str]]] = None, typecast: Optional[bool] = True)

Bases: openclean.function.value.classifier.ClassLabel

Class label assigner for datetime values.

openclean.function.value.datatype.DefaultDatatypeClassifier() openclean.function.value.classifier.ValueClassifier

Return an instance of the avlue classifier initialized with a default set of class labels.

Return type

openclean.function.value.classifier.ValueClassifier

class openclean.function.value.datatype.Float(label: Optional[str] = 'float', typecast: Optional[bool] = True)

Bases: openclean.function.value.classifier.ClassLabel

Class label assigner for float values.

class openclean.function.value.datatype.Int(label: Optional[str] = 'int', typecast: Optional[bool] = True)

Bases: openclean.function.value.classifier.ClassLabel

Class label assigner for integer values.

openclean.function.value.datatype.cast(value: Union[int, float, str, datetime.datetime], func: Callable, default_value: Optional[Union[int, float, str, datetime.datetime]] = None, raise_error: Optional[bool] = False) Union[int, float, str, datetime.datetime]

Generic type cast function. Attempts to cast values to a type (by using a provided callable). If type cast fails (i.e., the callable raises a aValueError) (i) an error is raised if the raise error flag is True, or (ii) a given default value is returned.

Parameters
  • value (scalar) – Scalar value that is being converted using the type conversion function.

  • func (callable) – Function that converts the data type of a given scalar value.

  • default_value (scalar, default=None) – Default value that is being returned for values that cannot be casted to the specified type if the raise_error flag is False.

  • raise_error (bool, default=False) – Raise ValueError if the value that is being extracted from a data frame row by the value function cannot be cast to the specified type.

Return type

scalar

openclean.function.value.datatype.has_two_spec_chars(value: Union[int, float, str, datetime.datetime]) bool

Returns True if the given string has at least two non-alphanumeric characters.

Parameters

value (scalar) – Scalar value in a data stream.

Return type

bool

openclean.function.value.datatype.is_datetime(value: Union[int, float, str, datetime.datetime], formats: Optional[Union[str, List[str]]] = None, typecast: Optional[bool] = True) bool

Test if a given string value can be converted into a datetime object for a given data format. The function accepts a single date format or a list of formates. If no format is given, ISO format is assumed as the default.

Parameters
  • value (scalar) – Scalar value that is tested for being a date.

  • formats (string or list(string)) – Date format string using Python strptime() format directives. This can be a list of date formats.

  • typecast (bool, default=True) – Attempt to parse string values as dates if True.

Return type

bool

openclean.function.value.datatype.is_float(value: Union[int, float, str, datetime.datetime], typecast: Optional[bool] = True) bool

Test if a given value is of type float. If the type cast flag is True, any string value that can successfully be converted to float will also be accepted.

Parameters
  • value (scalar) – Scalar value that is tested for being a float.

  • typecast (bool, default=True) – Cast string values to float if True.

Return type

bool

openclean.function.value.datatype.is_int(value: Union[int, float, str, datetime.datetime], typecast: Optional[bool] = True) bool

Test if a given value is of type integer. If the type cast flag is True, any string value that can successfully be converted to integer will also be accepted.

Parameters
  • value (scalar) – Scalar value that is tested for being an integer.

  • typecast (bool, default=True) – Cast string values to integer if True.

Return type

bool

openclean.function.value.datatype.is_nan(value: Union[int, float, str, datetime.datetime]) bool

Test if a given value is a number. Returns True if the given value is not a number.

Parameters

value (scalar) – Scalar value that is tested for being a number.

Return type

bool

openclean.function.value.datatype.is_numeric(value: Union[int, float, str, datetime.datetime], typecast: Optional[bool] = True, ignore_nan: Optional[bool] = True) bool

Test if a given value is of type integer or float. If the type cast flag is True, any string value that can successfully be converted to integer or float will also be accepted.

Parameters
  • value (scalar) – Scalar value that is tested for being a number.

  • typecast (bool, default=True) – Cast string values to integer or float if True.

  • ignore_nan (bool, default=False) – Consider NaN not as numeric if the flag is True

Return type

bool

openclean.function.value.datatype.is_numeric_type(value: Union[int, float, str, datetime.datetime]) bool

Test if a given value is of type integer or float (or the numpy equivalent). Does not attempt to cast string values.

Parameters

value (scalar) – Scalar value that is tested for being a number.

Return type

bool

openclean.function.value.datatype.to_datetime(value: Union[int, float, str, datetime.datetime], default_value: Optional[Union[int, float, str, datetime.datetime]] = None, raise_error: Optional[bool] = False) Union[int, float, str, datetime.datetime]

Converts a timestamp string in ISO format into a datatime object in UTC timezone.

Parameters
  • value (string) – String value that is being converted to datetime.

  • default_value (scalar, default=None) – Default value that is being returned for values that cannot be converted to datetime if the raise_error flag is False.

  • raise_error (bool, default=False) – Raise ValueError if the value cannot be converted to datetime.

Return type

scalar

openclean.function.value.datatype.to_datetime_format(value: Union[int, float, str, datetime.datetime], formats: Optional[Union[str, List[str]]] = None) datetime.datetime

Convert a given value to a datetime object for a given date format. If a list of format specifications is given an attempt is made to convert the value for each format (in given order) until the first format for which the conversion succeeds. If none of the formats match the given value None is returned.

Parameters
  • value (scalar) – Scalar value that is converted to a date.

  • formats (string or list(string)) – Date format string using Python strptime() format directives. This can be a list of date formats.

Return type

datetime.datetime

openclean.function.value.datatype.to_float(value: Union[int, float, str, datetime.datetime], default_value: Optional[Union[int, float, str, datetime.datetime]] = None, raise_error: Optional[bool] = False) Union[int, float, str, datetime.datetime]

Convert a given value to float. Raises an error if the given value cannot be converted to float and the raise error flag is True. If the flag is False, a given default value will be returned for thoses values that cannot be converted to float.

Parameters
  • value (scalar) – Scalar value that is being converted to float.

  • default_value (scalar, default=None) – Default value that is being returned for values that cannot be cast to float if the raise_error flag is False.

  • raise_error (bool, default=False) – Raise ValueError if the value cannot be cast to float.

Return type

scalar

openclean.function.value.datatype.to_int(value: Union[int, float, str, datetime.datetime], default_value: Optional[Union[int, float, str, datetime.datetime]] = None, raise_error: Optional[bool] = False) Union[int, float, str, datetime.datetime]

Convert a given value to integer. Raises an error if the given value cannot be converted to integer and the raise error flag is True. If the flag is False, a given default value will be returned for thoses values that cannot be converted to integer.

Parameters
  • value (scalar) – Scalar value that is being converted to integer.

  • default_value (scalar, default=None) – Default value that is being returned for values that cannot be cast to integer if the raise_error flag is False.

  • raise_error (bool, default=False) – Raise ValueError if the value cannot be cast to integer.

Return type

scalar

openclean.function.value.datatype.to_string(value: Union[int, float, str, datetime.datetime]) bool

Type cast function that tests if a given value is of type string. Returns the value if it is of type string or None, otherwise.

Parameters

value (scalar) – Scalar value that is tested for being a number.

Return type

bool