openclean.function.value.normalize.numeric module

Collection of functions to normalize numeric values in a list (e.g., a data frame column).

class openclean.function.value.normalize.numeric.DivideByTotal(raise_error=True, default_value=<function scalar_pass_through>, sum=None)

Bases: openclean.function.value.normalize.numeric.NumericNormalizer

Divide values in a list by the sum over all values.


Divide given value by the pre-computed sum over all values in the list. If the sum was zero the result will be zero.

If the given value is not a numeric value either a ValueError is raised if the respective flag is True or the default value is returned.


value (scalar) – Scalar value from the list that was used to prepare the function.

Return type



The function requires preparation if the sum is not set..

Return type



Compute the total sum over all values in the given list.


values (list) – List of scalar values or tuples of scalar values.

class openclean.function.value.normalize.numeric.MaxAbsScale(raise_error=True, default_value=<function scalar_pass_through>, maximum=None)

Bases: openclean.function.value.normalize.numeric.NumericNormalizer

Divided values in a list by the absolute maximum over all values.


Divide given value by the pre-computed sum over all values in the list. If the sum was zero the result will be zero.

If the given value is not a numeric value either a ValueError is raised if the respective flag is True or the default value is returned.


value (scalar) – Scalar value from the list that was used to prepare the function.

Return type



The function requires preparation if the sum is not set..

Return type



Compute the maximum value over all values in the given list.


values (list) – List of scalar values or tuples of scalar values.

class openclean.function.value.normalize.numeric.MinMaxScale(raise_error=True, default_value=<function scalar_pass_through>, minimum=None, maximum=None)

Bases: openclean.function.value.normalize.numeric.NumericNormalizer

Normalize values in a list using min-max feature scaling.


Normalize value using min-max feature scaling. If the pre-computed minimum and maximum for the value list are equal the result will be zero.


value (scalar) – Scalar value from the list that was used to prepare the function.

Return type



The function requires preparation if the sum is not set..

Return type



Compute the total sum over all values in the givem list.


values (list) – List of scalar values or tuples of scalar values.

class openclean.function.value.normalize.numeric.NumericNormalizer(raise_error=True, default_value=<function scalar_pass_through>)

Bases: openclean.function.value.base.ValueFunction

Abstract base class for numeric normalization functions. Implementing classes need to implement the compute and prepare methods.

abstract compute(value)

Individual normalization function that is dependent on the implementing sub-class. At this point it is assumed that the argument value is numeric.


value (scalar) – Scalar value from the list that was used to prepare the function.

Return type



Normalize a given value by calling the compute function of the implementing class.

If the given value is not a numeric value either a ValueError is raised if the respective flag is True or the default value is returned.


value (scalar) – Scalar value from the list that was used to prepare the function.

Return type


openclean.function.value.normalize.numeric.divide_by_total(values, raise_error=True, default_value=<function scalar_pass_through>)

Divide values in a list by the sum over all values. Values that are not numeric are either replaced with a given default value or an error is raised if the raise error flag is True.

  • values (list) – List of scalar values.

  • raise_error (bool, optional) – Raise ValueError if the list contains values that are not integer or float. If False, non-numeric values are ignored.

  • default_value (scalar, tuple, or callable, default=scalar_pass_through) – Value (or function) that is used (evaluated) as substitute for non-numeric values if no error is raised. By default, a value is returned as is.

openclean.function.value.normalize.numeric.max_abs_scale(values, raise_error=True, default_value=<function scalar_pass_through>)

Divide values in a list by the absolute maximum over all values. Values that are not numeric are either replaced with a given default value or an error is raised if the raise error flag is True.

  • values (list) – List of scalar values.

  • raise_error (bool, optional) – Raise ValueError if the list contains values that are not integer or float. If False, non-numeric values are ignored.

  • default_value (scalar, tuple, or callable, default=scalar_pass_through) – Value (or function) that is used (evaluated) as substitute for non-numeric values if no error is raised. By default, a value is returned as is.

openclean.function.value.normalize.numeric.min_max_scale(values, raise_error=True, default_value=<function scalar_pass_through>)

Normalize values in a list using min-max feature scaling. Values that are not numeric are either replaced with a given default value or an error is raised if the raise error flag is True.

  • values (list) – List of scalar values.

  • raise_error (bool, optional) – Raise ValueError if the list contains values that are not integer or float. If False, non-numeric values are ignored.

  • default_value (scalar, tuple, or callable, default=scalar_pass_through) – Value (or function) that is used (evaluated) as substitute for non-numeric values if no error is raised. By default, a value is returned as is.