openclean.function.value.normalize.numeric module
Collection of functions to normalize numeric values in a list (e.g., a data frame column).
- class openclean.function.value.normalize.numeric.DivideByTotal(raise_error=True, default_value=<function scalar_pass_through>, sum=None)
Bases:
openclean.function.value.normalize.numeric.NumericNormalizer
Divide values in a list by the sum over all values.
- compute(value)
Divide given value by the pre-computed sum over all values in the list. If the sum was zero the result will be zero.
If the given value is not a numeric value either a ValueError is raised if the respective flag is True or the default value is returned.
- Parameters
value (scalar) – Scalar value from the list that was used to prepare the function.
- Return type
float
- is_prepared()
The function requires preparation if the sum is not set..
- Return type
bool
- prepare(values)
Compute the total sum over all values in the given list.
- Parameters
values (list) – List of scalar values or tuples of scalar values.
- class openclean.function.value.normalize.numeric.MaxAbsScale(raise_error=True, default_value=<function scalar_pass_through>, maximum=None)
Bases:
openclean.function.value.normalize.numeric.NumericNormalizer
Divided values in a list by the absolute maximum over all values.
- compute(value)
Divide given value by the pre-computed sum over all values in the list. If the sum was zero the result will be zero.
If the given value is not a numeric value either a ValueError is raised if the respective flag is True or the default value is returned.
- Parameters
value (scalar) – Scalar value from the list that was used to prepare the function.
- Return type
float
- is_prepared()
The function requires preparation if the sum is not set..
- Return type
bool
- prepare(values)
Compute the maximum value over all values in the given list.
- Parameters
values (list) – List of scalar values or tuples of scalar values.
- class openclean.function.value.normalize.numeric.MinMaxScale(raise_error=True, default_value=<function scalar_pass_through>, minimum=None, maximum=None)
Bases:
openclean.function.value.normalize.numeric.NumericNormalizer
Normalize values in a list using min-max feature scaling.
- compute(value)
Normalize value using min-max feature scaling. If the pre-computed minimum and maximum for the value list are equal the result will be zero.
- Parameters
value (scalar) – Scalar value from the list that was used to prepare the function.
- Return type
float
- is_prepared()
The function requires preparation if the sum is not set..
- Return type
bool
- prepare(values)
Compute the total sum over all values in the givem list.
- Parameters
values (list) – List of scalar values or tuples of scalar values.
- class openclean.function.value.normalize.numeric.NumericNormalizer(raise_error=True, default_value=<function scalar_pass_through>)
Bases:
openclean.function.value.base.ValueFunction
Abstract base class for numeric normalization functions. Implementing classes need to implement the compute and prepare methods.
- abstract compute(value)
Individual normalization function that is dependent on the implementing sub-class. At this point it is assumed that the argument value is numeric.
- Parameters
value (scalar) – Scalar value from the list that was used to prepare the function.
- Return type
float
- eval(value)
Normalize a given value by calling the compute function of the implementing class.
If the given value is not a numeric value either a ValueError is raised if the respective flag is True or the default value is returned.
- Parameters
value (scalar) – Scalar value from the list that was used to prepare the function.
- Return type
float
- openclean.function.value.normalize.numeric.divide_by_total(values, raise_error=True, default_value=<function scalar_pass_through>)
Divide values in a list by the sum over all values. Values that are not numeric are either replaced with a given default value or an error is raised if the raise error flag is True.
- Parameters
values (list) – List of scalar values.
raise_error (bool, optional) – Raise ValueError if the list contains values that are not integer or float. If False, non-numeric values are ignored.
default_value (scalar, tuple, or callable, default=scalar_pass_through) – Value (or function) that is used (evaluated) as substitute for non-numeric values if no error is raised. By default, a value is returned as is.
- openclean.function.value.normalize.numeric.max_abs_scale(values, raise_error=True, default_value=<function scalar_pass_through>)
Divide values in a list by the absolute maximum over all values. Values that are not numeric are either replaced with a given default value or an error is raised if the raise error flag is True.
- Parameters
values (list) – List of scalar values.
raise_error (bool, optional) – Raise ValueError if the list contains values that are not integer or float. If False, non-numeric values are ignored.
default_value (scalar, tuple, or callable, default=scalar_pass_through) – Value (or function) that is used (evaluated) as substitute for non-numeric values if no error is raised. By default, a value is returned as is.
- openclean.function.value.normalize.numeric.min_max_scale(values, raise_error=True, default_value=<function scalar_pass_through>)
Normalize values in a list using min-max feature scaling. Values that are not numeric are either replaced with a given default value or an error is raised if the raise error flag is True.
- Parameters
values (list) – List of scalar values.
raise_error (bool, optional) – Raise ValueError if the list contains values that are not integer or float. If False, non-numeric values are ignored.
default_value (scalar, tuple, or callable, default=scalar_pass_through) – Value (or function) that is used (evaluated) as substitute for non-numeric values if no error is raised. By default, a value is returned as is.