openclean.function.value.picker module

Majority picker functions that select a single value from a counter object.

class openclean.function.value.picker.MajorityVote(threshold: Optional[float] = None, normalizer: Optional[Callable] = None, raise_error: Optional[bool] = False)

Bases: openclean.function.value.picker.ValuePicker

Majority picker that select the most frequent value. This picker returns the default value (or raises an error) if the given list of values is empty or there are multiple-most frequent values.

THe picker allows to define an additional threshold (min. frequency) that the most-frequent value has to satisfy.

pick(values: collections.Counter, default: Optional[Union[Callable, int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]] = None, raise_error: Optional[bool] = False) Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]

Return the the most frequent value in the counter. If the counter is empty or contains multiple most-frequent values the default value is returned or a ValueError is raised.

Parameters
  • values (collections.Counter) – Frequency counter for values.

  • default (scalar, tuple, or callable, default=None) – Default value that is returned if the counter contains no values or multiple most-frequent values.

  • raise_error (bool, default=False) – Raise a ValueError if the counter contains no values or multiple most-frequent values.

Return type

scalar, tuple, or callabel

Raises

ValueError

class openclean.function.value.picker.OnlyOneValue(raise_error: Optional[bool] = False)

Bases: openclean.function.value.picker.ValuePicker

Majority picker that only selects a value if the given counter contains exactly one value.

pick(values: collections.Counter, default: Optional[Union[Callable, int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]] = None, raise_error: Optional[bool] = False) Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]

Return the only value in the given counter. If the counter contains no value or more than one value the default value is returned or a ValueError is raised.

Parameters
  • values (collections.Counter) – Frequency counter for values. The most frequent value from the counter is returned if the counter contains exectly one value.

  • default (scalar, tuple, or callable, default=None) – Default value that is returned if the counter contains no values or multiple values.

  • raise_error (bool, default=False) – Raise a ValueError if the counter contains no values or multiple values.

Return type

scalar, tuple, or callabel

Raises

ValueError

class openclean.function.value.picker.ValuePicker(raise_error: Optional[bool] = False)

Bases: openclean.function.value.base.ValueFunction

Value function that is used to select a single value, e.g., the most frequent value, from a list of values. The picker acts as a value function that picks a value when the function is prepared and then returns a constant function that returns the picked value for any input value. This type of behavior is for example needed for majority voting where we want to replace all values in a given attribute with a single (most frequent) value.

eval(value: Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]) Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]

The value picker requires to be prepared. It returns a new value function as the preparation result. The eval method of this class therefore raises an error if called.

Parameters

value (scalar or tuple) – Value from the list that was used to prepare the function.

Raises

NotImplementedError

is_prepared() bool

The value picker needs to be prepared.

Return type

bool

abstract pick(values: collections.Counter, default: Optional[Union[Callable, int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]] = None, raise_error: Optional[bool] = False) Union[Callable, int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]

Picker function that returns the most frequent value from the given counter. Different implementations may impose additional constraints on whether a value is selected or not. If no value is selected by the picker, either the default is returned or a ValueError is raised (if the raise_error flag is True).

Parameters
  • values (collections.Counter) – Frequency counter for values. The most frequent value from the counter is returned if it satisfies additional (implementation- specific constraints).

  • default (scalar, tuple, or callable, default=None) – Default value that is returned if the most frequent value does not satisfy the imposed constraints and the raise_error flag is False.

  • raise_error (bool, default=False) – Raise a ValueError if the most frequent value does not satisfy the imposed constraints.

Return type

scalar, tuple, or callabel

Raises

ValueError

prepare(values: List[Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]]) openclean.function.value.base.ValueFunction

Optional step to prepare the function for a given set of values. This step allows to compute additional statistics over the set of values.

While it is likely that the given set of values represents the values for which the eval() function will be called, this property is not guaranteed.

Parameters

values (dict) – Set of distinct scalar values or tuples of scalar values that are mapped to their respective frequency count.

Return type

openclean.function.value.base.ValueFunction