openclean.function.value.base module
Base class for value function. Collection of basic helper functions.
- class openclean.function.value.base.CallableWrapper(func: Callable)
Bases:
openclean.function.value.base.PreparedFunction
Wrapper for callable functions as value functions. This value function does not prepare the wrapped callable.
- eval(value: Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]) Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]
Evaluate the wrapped function on a given value. The value may either be a scalar or a tuple. The return value of the function is dependent on the wrapped function.
- Parameters
value (scalar or tuple) – Value from the list that was used to prepare the function.
- Return type
scalar or tuple
- class openclean.function.value.base.ConstantValue(value: Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]])
Bases:
openclean.function.value.base.PreparedFunction
Value function that returns a given constant value for all inputs.
- eval(value: Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]) Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]
Return the constant result value.
- Parameters
value (scalar or tuple) – Value from the list that was used to prepare the function.
- Return type
any
- class openclean.function.value.base.CounterConverter(func: Callable)
Bases:
openclean.function.value.base.PreparedFunction
Wrapper for callable functions that are appied on items of a value counter.
- eval(value: Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]) Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]
Evaluate the wrapped function on a given value.
The value is expected to be a tuple (item from a
collection.Counter
object) that contains a value and its count. The wrapped callable is applied on the value and a tuple with the modified value and the original count is returned.- Parameters
value (scalar or tuple) – Value from the list that was used to prepare the function.
- Return type
scalar or tuple
- class openclean.function.value.base.PreparedFunction
Bases:
openclean.function.value.base.ValueFunction
Abstract base class for value functions that do not make use of the prepare method. These functions are considered as initialized and ready to operate without the need for calling the prepare method first.
- is_prepared() bool
Instances of this class do not need to be further prepared.
- Return type
bool
- prepare(values: List[Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]]) openclean.function.value.base.ValueFunction
The prepare step is ignored for a wrapped callable.
- Parameters
values (dict) – Set of distinct scalar values or tuples of scalar values that are mapped to their respective frequency count.
- class openclean.function.value.base.UnpreparedFunction
Bases:
openclean.function.value.base.ValueFunction
Abstract base class for value functions that make use of the prepare method. These functions are expected to return a new instance of a different value function class as the result of the prepare step.
- eval(value: Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]) Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]
Raise an error if the eval method is called since this indicates that the function has not been prepared.
- Parameters
value (scalar or tuple) – Value from the list that was used to prepare the function.
- Return type
scalar or tuple
- is_prepared() bool
Returns False because the function required to be prepared.
- Return type
bool
- class openclean.function.value.base.ValueFunction
Bases:
object
The abstract class for value functions defines the interface for methods that need to be implemented for preparing and evaluating the function.
- apply(values: Union[List[Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]], collections.Counter], threads: Optional[int] = None) Union[List[Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]], collections.Counter]
Apply the function to each value in a given set.
Depending on the type of the input, the result is either a list of values that are the result of the eval method for the respective input values or a new counter object where keys are the modified values.
Calls the prepare method before executing the eval method on each individual value in the given list.
- Parameters
values (list) – List of scalar values or tuples of scalar values.
threads (int, default=None) – Number of parallel threads to use for processing. If None the value from the environment variable ‘OPENCLEAN_THREADS’ is used as the default.
- Return type
list
- abstract eval(value: Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]) Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]
Evaluate the function on a given value. The value may either be a scalar or a tuple. The value will be from the list of values that was passed to the object in the prepare call.
The return value of the function is implementation dependent.
- Parameters
value (scalar or tuple) – Value from the list that was used to prepare the function.
- Return type
scalar or tuple
- abstract is_prepared() bool
Returns True if the prepare method is ignored by an implementation of this function. Containing classes will only call the prepare method for those value functions that are not prepared.
- Return type
bool
- map(values: List[Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]]) Dict
The map function takes a list of values and outputs a dictionary. The keys in the returned dictionary are the distinct values in the input list. The values that are associated with the keys are the result of applying the eval function of this class on the key value.
- Parameters
values (list) – List of scalar values or tuples of scalar values.
- Return type
dict
- abstract prepare(values: List[Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]]) openclean.function.value.base.ValueFunction
Optional step to prepare the function for a given set of values. This step allows to compute additional statistics over the set of values.
While it is likely that the given set of values represents the values for which the eval() function will be called, this property is not guaranteed.
- Parameters
values (list) – List of scalar values or tuples of scalar values.
- Return type
- openclean.function.value.base.extract(values, label, raise_error=True, default_value=None)
Create a flat dictionary from a nested one. The resulting dictionary contains the same keys as the input dictionary. The associated values are the values from the nested dictionaries under the given label.
If a nested value does not contain the given label as key a KeyError is raised if the raise error flag is True. If the flag is False the given default value is used instead.
- Parameters
values (dict) – Nested dictionary from which the values with the given label are extracted.
label (string) – Label of element for which the metadata array is created.
raise_error (bool, default=True) – Raise a KeyError if a nested dictionary value does not contain the given label as a key.
default_value (any, default=None) – Default value for values that do not contain the the given label as a key.
- Return type
openclean.data,metadata.Feature
- Raises
KeyError –
- openclean.function.value.base.merge(values_1, values_2, labels, join='inner')
Merge two dictionaries. The resulting dictionary will map key values to dictionaries. Each nested dictionary has two elements, representing the values from the respective merged dictionary. The labels for these elements are defined by the labels argument.
The join method allows for four types of merging:
- inner: Keep only those keys that are in the intersection of both
dictionaries.
outer: Keep all keys from the union of both dictionaries.
left-outer: Keep all keys from the first dictionary.
right-outer: Keep all keys from the second dictionary.
Raises a ValueError if the number of given labels is not two or if an invalid join method is specified.
- Parameters
vaues_1 (dict) – Left side of the join.
values_2 (dict) – Right side of the join.
join (enum['inner', 'outer', 'left-outer', 'right-outer'], default='inner') – Join method identifier.
- Return type
dict
- Raises
ValueError –
- openclean.function.value.base.normalize(values, normalizer, keep_original=False, labels=None)
Normalize frequency counts in a given dictionary. Expects a dictionary where keys are mapped to numeric values. Applies the given normalization function on all values. Returns a dictionary where keys are mapped to the normalized values.
If the keep_original flag is True, the original values are also included in the result. In this case, the keys in the resulting dictionary are mapped to dictionaries with two values. The default key values for the nested dictionary values are ‘absolute’ for the original value and ‘normalized’ for the normalized value. These names can be overridden by providing a list or tuple of labels with exactly two elements.
- Parameters
values (dict) – Dictionary that maps arbitrary key values to numeric values.
normalizer (callable or openclean.function.value.base.ValueFunction,) – default=None Normalization function that will be used to normalize the numeric values in the given dictionary.
keep_original (bool, default=False) – If the keep original value is set to True, the resulting dictionary will map key values to dictionaries. Each nested dictionary will have two elements, the original (‘absolute’) value and the normalized value.
labels (list or tuple, default=('absolute', 'normalized')) – List or tuple with exactly two elements. The labels will only be used if the keep_original flag is True. The first element is the label for the original value in the returned nested dictionary and the second element is the label for the normalized value.
- Return type
dict
- Raises
ValueError –
- openclean.function.value.base.to_value_function(arg)
Ensure that a given argument is a ValueFunction. If the arg is callable it will be wrapped. Otherwise, a constant value function is returned.
- Parameters
arg (any) – Argument that is tested for being a ValueFunction.
- Return type