openclean.profiling.constraints.fd module
Base classes for functional dependency (FD) discovery. FDs express relationships between attributes of a dataset. FDs were originally used in database design, especially schema normalization. FDs can also be used for data cleaning purposes to identfy sets of rows (tuples) that violate a given constraint and are therefore candidates for data repair.
- class openclean.profiling.constraints.fd.FunctionalDependency(lhs: Union[int, str, List[Union[str, int]]], rhs: Union[int, str, List[Union[str, int]]])
Bases:
object
Functional dependencies describe a relationship between two sets of attributes. These sets are referred to as the determinant (left-hand-size) and dependant (right-hand-size).
- class openclean.profiling.constraints.fd.FunctionalDependencyFinder
Bases:
object
Interface for operators that discover functional dependencies in a given data frame.
- abstract run(df: pandas.core.frame.DataFrame) List[openclean.profiling.constraints.fd.FunctionalDependency]
Run the implemented functional dependency discovery algorithm on the given data frame. Returns a list of all discovered functional dependencies.
- Parameters
df (pd.DataFrame) – Input data frame.
- Return type
list of FunctionalDependency