openclean.function.value.normalize.numeric module

Collection of functions to normalize numeric values in a list (e.g., a data frame column).

class openclean.function.value.normalize.numeric.DivideByTotal(raise_error=True, default_value=<function scalar_pass_through>, sum=None)

Bases: openclean.function.value.normalize.numeric.NumericNormalizer

Divide values in a list by the sum over all values.

compute(value)

Divide given value by the pre-computed sum over all values in the list. If the sum was zero the result will be zero.

If the given value is not a numeric value either a ValueError is raised if the respective flag is True or the default value is returned.

Parameters

value (scalar) – Scalar value from the list that was used to prepare the function.

Return type

float

is_prepared()

The function requires preparation if the sum is not set..

Return type

bool

prepare(values)

Compute the total sum over all values in the given list.

Parameters

values (list) – List of scalar values or tuples of scalar values.

class openclean.function.value.normalize.numeric.MaxAbsScale(raise_error=True, default_value=<function scalar_pass_through>, maximum=None)

Bases: openclean.function.value.normalize.numeric.NumericNormalizer

Divided values in a list by the absolute maximum over all values.

compute(value)

Divide given value by the pre-computed sum over all values in the list. If the sum was zero the result will be zero.

If the given value is not a numeric value either a ValueError is raised if the respective flag is True or the default value is returned.

Parameters

value (scalar) – Scalar value from the list that was used to prepare the function.

Return type

float

is_prepared()

The function requires preparation if the sum is not set..

Return type

bool

prepare(values)

Compute the maximum value over all values in the given list.

Parameters

values (list) – List of scalar values or tuples of scalar values.

class openclean.function.value.normalize.numeric.MinMaxScale(raise_error=True, default_value=<function scalar_pass_through>, minimum=None, maximum=None)

Bases: openclean.function.value.normalize.numeric.NumericNormalizer

Normalize values in a list using min-max feature scaling.

compute(value)

Normalize value using min-max feature scaling. If the pre-computed minimum and maximum for the value list are equal the result will be zero.

Parameters

value (scalar) – Scalar value from the list that was used to prepare the function.

Return type

float

is_prepared()

The function requires preparation if the sum is not set..

Return type

bool

prepare(values)

Compute the total sum over all values in the givem list.

Parameters

values (list) – List of scalar values or tuples of scalar values.

class openclean.function.value.normalize.numeric.NumericNormalizer(raise_error=True, default_value=<function scalar_pass_through>)

Bases: openclean.function.value.base.ValueFunction

Abstract base class for numeric normalization functions. Implementing classes need to implement the compute and prepare methods.

abstract compute(value)

Individual normalization function that is dependent on the implementing sub-class. At this point it is assumed that the argument value is numeric.

Parameters

value (scalar) – Scalar value from the list that was used to prepare the function.

Return type

float

eval(value)

Normalize a given value by calling the compute function of the implementing class.

If the given value is not a numeric value either a ValueError is raised if the respective flag is True or the default value is returned.

Parameters

value (scalar) – Scalar value from the list that was used to prepare the function.

Return type

float

openclean.function.value.normalize.numeric.divide_by_total(values, raise_error=True, default_value=<function scalar_pass_through>)

Divide values in a list by the sum over all values. Values that are not numeric are either replaced with a given default value or an error is raised if the raise error flag is True.

Parameters
  • values (list) – List of scalar values.

  • raise_error (bool, optional) – Raise ValueError if the list contains values that are not integer or float. If False, non-numeric values are ignored.

  • default_value (scalar, tuple, or callable, default=scalar_pass_through) – Value (or function) that is used (evaluated) as substitute for non-numeric values if no error is raised. By default, a value is returned as is.

openclean.function.value.normalize.numeric.max_abs_scale(values, raise_error=True, default_value=<function scalar_pass_through>)

Divide values in a list by the absolute maximum over all values. Values that are not numeric are either replaced with a given default value or an error is raised if the raise error flag is True.

Parameters
  • values (list) – List of scalar values.

  • raise_error (bool, optional) – Raise ValueError if the list contains values that are not integer or float. If False, non-numeric values are ignored.

  • default_value (scalar, tuple, or callable, default=scalar_pass_through) – Value (or function) that is used (evaluated) as substitute for non-numeric values if no error is raised. By default, a value is returned as is.

openclean.function.value.normalize.numeric.min_max_scale(values, raise_error=True, default_value=<function scalar_pass_through>)

Normalize values in a list using min-max feature scaling. Values that are not numeric are either replaced with a given default value or an error is raised if the raise error flag is True.

Parameters
  • values (list) – List of scalar values.

  • raise_error (bool, optional) – Raise ValueError if the list contains values that are not integer or float. If False, non-numeric values are ignored.

  • default_value (scalar, tuple, or callable, default=scalar_pass_through) – Value (or function) that is used (evaluated) as substitute for non-numeric values if no error is raised. By default, a value is returned as is.