openclean.profiling.constraints.fd module

Base classes for functional dependency (FD) discovery. FDs express relationships between attributes of a dataset. FDs were originally used in database design, especially schema normalization. FDs can also be used for data cleaning purposes to identfy sets of rows (tuples) that violate a given constraint and are therefore candidates for data repair.

class openclean.profiling.constraints.fd.FunctionalDependency(lhs: Union[int, str, List[Union[str, int]]], rhs: Union[int, str, List[Union[str, int]]])

Bases: object

Functional dependencies describe a relationship between two sets of attributes. These sets are referred to as the determinant (left-hand-size) and dependant (right-hand-size).

property dependant: Union[int, str, List[Union[str, int]]]

Get the dependant (right-hand-side) of the functional dependency.

Return type

int, string, or list of int or string of Column

property determinant: Union[int, str, List[Union[str, int]]]

Get the determinant (left-hand-side) of the functional dependency.

Return type

int, string, or list of int or string of Column

class openclean.profiling.constraints.fd.FunctionalDependencyFinder

Bases: object

Interface for operators that discover functional dependencies in a given data frame.

abstract run(df: pandas.core.frame.DataFrame) List[openclean.profiling.constraints.fd.FunctionalDependency]

Run the implemented functional dependency discovery algorithm on the given data frame. Returns a list of all discovered functional dependencies.

Parameters

df (pd.DataFrame) – Input data frame.

Return type

list of FunctionalDependency