openclean.function.value.mapping module

The mapping operator that returns a dictionary that contains a mapping of original values in a data frame column(s) to results of applying a given value function on them.

Lookup functions represent mappings using dictionaries.

class openclean.function.value.mapping.Lookup(mapping: Dict, raise_error: Optional[bool] = False, default: Optional[Union[Callable, int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]] = None, as_string: Optional[bool] = False)

Bases: openclean.function.value.base.PreparedFunction

Dictionary lookup function. Uses a mapping dictionary to convert given input values to their pre-defined targets.

eval(value: Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]) Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]

Return the defined target value for a given lookup value.

Parameters

value (scalar) – Scalar value in a data stream.

Return type

any

class openclean.function.value.mapping.Standardize(mapping: Dict)

Bases: openclean.function.value.base.PreparedFunction

Use a mapping dictionary to standardize values. For a given value, if a mapping is defined in the dictionary the mapped value is returned. For all other values the original value is returned.

eval(value: Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]) Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]

Return the defined target value for a given lookup value. If the given value is not included in the standardization mapping it will be returned as is.

Parameters

value (scalar) – Scalar value in a data stream.

Return type

any

openclean.function.value.mapping.mapping(df: pandas.core.frame.DataFrame, columns: Union[int, str, List[Union[str, int]]], func: Union[Callable, openclean.function.value.base.ValueFunction]) Dict

Get the mapping of values that are modified by a given value function.

Parameters
  • df (pandas.DataFrame) – Input data frame.

  • columns (int, string, or list(int or string), optional) – Single column or list of column index positions or column names.

  • func (callable or openclean.function.value.base.ValueFunction) – Callable or value function that accepts a single value as the argument.

Return type

dict

Raises

ValueError

openclean.function.value.mapping.replace(predicate: Callable, value: Union[int, float, str, datetime.datetime, Tuple[Union[int, float, str, datetime.datetime]]]) openclean.function.value.cond.ConditionalStatement

Return an instance of the Replace class for the given arguments.

Parameters
  • predicate (callable) – Predicate that is evalauated on input values.

  • value (scalar or tuple) – Replacement value for inputs that satisfy the predicate.

Return type

openclean.function.value.mapping.Replace