openclean.cluster.index module
Index structure for value clusters.
- class openclean.cluster.index.ClusterIndex
Bases:
object
Index structure to maintain a set of clusters. Implements a prefix tree.
- add(cluster: openclean.cluster.base.Cluster) bool
Add the given cluster to the index. Returns True if the cluster was added as a new cluster (i.e., it did not exist in the index before) and False otherwise.
- Parameters
cluster (openclean.cluser.base.Cluster) – Cluster of data value.
- Return type
bool
- class openclean.cluster.index.Node(key: str, count: int)
Bases:
object
Node in the cluster index.
- add(values: List[Tuple[str, int]], pos: int) bool
Add the values in the given list starting from
pos
to the children of this node.Returns True if at the end the cluster was added as a new cluster to the index.
- Parameters
values (list of tuples of string and count) – List of values and the frequencies in a cluster that is being added to the cluster index.
pos (int) – Index position in the list that points to the child node that is added to this node.