openclean.function.value.datatype module
Collection of unary functions for checking and converting the data type of given scalar values. Includes functions for type case and predicates for type checking.
- class openclean.function.value.datatype.Datetime(label: Optional[str] = 'datetime', formats: Optional[Union[str, List[str]]] = None, typecast: Optional[bool] = True)
Bases:
openclean.function.value.classifier.ClassLabel
Class label assigner for datetime values.
- openclean.function.value.datatype.DefaultDatatypeClassifier() openclean.function.value.classifier.ValueClassifier
Return an instance of the avlue classifier initialized with a default set of class labels.
- class openclean.function.value.datatype.Float(label: Optional[str] = 'float', typecast: Optional[bool] = True)
Bases:
openclean.function.value.classifier.ClassLabel
Class label assigner for float values.
- class openclean.function.value.datatype.Int(label: Optional[str] = 'int', typecast: Optional[bool] = True)
Bases:
openclean.function.value.classifier.ClassLabel
Class label assigner for integer values.
- openclean.function.value.datatype.cast(value: Union[int, float, str, datetime.datetime], func: Callable, default_value: Optional[Union[int, float, str, datetime.datetime]] = None, raise_error: Optional[bool] = False) Union[int, float, str, datetime.datetime]
Generic type cast function. Attempts to cast values to a type (by using a provided callable). If type cast fails (i.e., the callable raises a aValueError) (i) an error is raised if the raise error flag is True, or (ii) a given default value is returned.
- Parameters
value (scalar) – Scalar value that is being converted using the type conversion function.
func (callable) – Function that converts the data type of a given scalar value.
default_value (scalar, default=None) – Default value that is being returned for values that cannot be casted to the specified type if the raise_error flag is False.
raise_error (bool, default=False) – Raise ValueError if the value that is being extracted from a data frame row by the value function cannot be cast to the specified type.
- Return type
scalar
- openclean.function.value.datatype.has_two_spec_chars(value: Union[int, float, str, datetime.datetime]) bool
Returns True if the given string has at least two non-alphanumeric characters.
- Parameters
value (scalar) – Scalar value in a data stream.
- Return type
bool
- openclean.function.value.datatype.is_datetime(value: Union[int, float, str, datetime.datetime], formats: Optional[Union[str, List[str]]] = None, typecast: Optional[bool] = True) bool
Test if a given string value can be converted into a datetime object for a given data format. The function accepts a single date format or a list of formates. If no format is given, ISO format is assumed as the default.
- Parameters
value (scalar) – Scalar value that is tested for being a date.
formats (string or list(string)) – Date format string using Python strptime() format directives. This can be a list of date formats.
typecast (bool, default=True) – Attempt to parse string values as dates if True.
- Return type
bool
- openclean.function.value.datatype.is_float(value: Union[int, float, str, datetime.datetime], typecast: Optional[bool] = True) bool
Test if a given value is of type float. If the type cast flag is True, any string value that can successfully be converted to float will also be accepted.
- Parameters
value (scalar) – Scalar value that is tested for being a float.
typecast (bool, default=True) – Cast string values to float if True.
- Return type
bool
- openclean.function.value.datatype.is_int(value: Union[int, float, str, datetime.datetime], typecast: Optional[bool] = True) bool
Test if a given value is of type integer. If the type cast flag is True, any string value that can successfully be converted to integer will also be accepted.
- Parameters
value (scalar) – Scalar value that is tested for being an integer.
typecast (bool, default=True) – Cast string values to integer if True.
- Return type
bool
- openclean.function.value.datatype.is_nan(value: Union[int, float, str, datetime.datetime]) bool
Test if a given value is a number. Returns True if the given value is not a number.
- Parameters
value (scalar) – Scalar value that is tested for being a number.
- Return type
bool
- openclean.function.value.datatype.is_numeric(value: Union[int, float, str, datetime.datetime], typecast: Optional[bool] = True, ignore_nan: Optional[bool] = True) bool
Test if a given value is of type integer or float. If the type cast flag is True, any string value that can successfully be converted to integer or float will also be accepted.
- Parameters
value (scalar) – Scalar value that is tested for being a number.
typecast (bool, default=True) – Cast string values to integer or float if True.
ignore_nan (bool, default=False) – Consider NaN not as numeric if the flag is True
- Return type
bool
- openclean.function.value.datatype.is_numeric_type(value: Union[int, float, str, datetime.datetime]) bool
Test if a given value is of type integer or float (or the numpy equivalent). Does not attempt to cast string values.
- Parameters
value (scalar) – Scalar value that is tested for being a number.
- Return type
bool
- openclean.function.value.datatype.to_datetime(value: Union[int, float, str, datetime.datetime], default_value: Optional[Union[int, float, str, datetime.datetime]] = None, raise_error: Optional[bool] = False) Union[int, float, str, datetime.datetime]
Converts a timestamp string in ISO format into a datatime object in UTC timezone.
- Parameters
value (string) – String value that is being converted to datetime.
default_value (scalar, default=None) – Default value that is being returned for values that cannot be converted to datetime if the raise_error flag is False.
raise_error (bool, default=False) – Raise ValueError if the value cannot be converted to datetime.
- Return type
scalar
- openclean.function.value.datatype.to_datetime_format(value: Union[int, float, str, datetime.datetime], formats: Optional[Union[str, List[str]]] = None) datetime.datetime
Convert a given value to a datetime object for a given date format. If a list of format specifications is given an attempt is made to convert the value for each format (in given order) until the first format for which the conversion succeeds. If none of the formats match the given value None is returned.
- Parameters
value (scalar) – Scalar value that is converted to a date.
formats (string or list(string)) – Date format string using Python strptime() format directives. This can be a list of date formats.
- Return type
datetime.datetime
- openclean.function.value.datatype.to_float(value: Union[int, float, str, datetime.datetime], default_value: Optional[Union[int, float, str, datetime.datetime]] = None, raise_error: Optional[bool] = False) Union[int, float, str, datetime.datetime]
Convert a given value to float. Raises an error if the given value cannot be converted to float and the raise error flag is True. If the flag is False, a given default value will be returned for thoses values that cannot be converted to float.
- Parameters
value (scalar) – Scalar value that is being converted to float.
default_value (scalar, default=None) – Default value that is being returned for values that cannot be cast to float if the raise_error flag is False.
raise_error (bool, default=False) – Raise ValueError if the value cannot be cast to float.
- Return type
scalar
- openclean.function.value.datatype.to_int(value: Union[int, float, str, datetime.datetime], default_value: Optional[Union[int, float, str, datetime.datetime]] = None, raise_error: Optional[bool] = False) Union[int, float, str, datetime.datetime]
Convert a given value to integer. Raises an error if the given value cannot be converted to integer and the raise error flag is True. If the flag is False, a given default value will be returned for thoses values that cannot be converted to integer.
- Parameters
value (scalar) – Scalar value that is being converted to integer.
default_value (scalar, default=None) – Default value that is being returned for values that cannot be cast to integer if the raise_error flag is False.
raise_error (bool, default=False) – Raise ValueError if the value cannot be cast to integer.
- Return type
scalar
- openclean.function.value.datatype.to_string(value: Union[int, float, str, datetime.datetime]) bool
Type cast function that tests if a given value is of type string. Returns the value if it is of type string or None, otherwise.
- Parameters
value (scalar) – Scalar value that is tested for being a number.
- Return type
bool