openclean.function.value.classifier module

Base classes to classify for scalar values and to compute summaries over data frames that have class labels assigned to their data rows.

class openclean.function.value.classifier.ClassLabel(predicate, label, truth_value=True, none_label=None)

Bases: openclean.function.value.base.ValueFunction

Classifier for a single type. Assigns a pre-defined class label to values that belong to the type that is represented by the classifier. Type membership is represented by a given predicate. All values that do not satisfy the predicate are assigned a pre-defined non-label.

eval(value)

Evaluate the function on a given value. The value may either be a scalar or a tuple. The value will be from the list of values that was passed to the object in the prepare call.

The return value of the function is implementation dependent.

Parameters

value (scalar or tuple) – Scalar data value that is being classified.

Return type

scalar or tuple

is_prepared()

Checks if the wrapped predicate requires preparation.

Return type

bool

prepare(values)

Call the prepare method of the associated predicate.

Parameters

values (list) – List of scalar values or tuples of scalar values.

class openclean.function.value.classifier.ValueClassifier(*args, **kwargs)

Bases: openclean.function.value.base.ValueFunction

The value classifier evaluates a list of predicates or conditions on a given value (scalar or tuple). Each predicate is associated with a class label. The corresponding class label for the first predicate that is satisfied by the value is returned as the classification result. If no predicate is satisfied by a given value the result is either a default label or a ValueError is raised if the raise error flag is set to True.

eval(value)

Evaluate the classifiers on the given value. The classifier are evaluated in the order of their appearance in the list. Returns the associated label for the first predicate that is satisfied (i.e., the classifier returns a label that is not the none label). If none of the classifier predicates is satisfied the result is the default label. If the raise error flag is True an error is raised instead.

Parameters

value (scalar) – Scalar data value that is being classified.

Return type

scalar

Raises

ValueError

is_prepared()

Returns False if any of the wrapped classifiers needs preparation.

Return type

bool

prepare(values)

Call the prepare method of the associated classifiers.

Parameters

values (list) – List of scalar values or tuples of scalar values.