openclean.data.mapping module

class openclean.data.mapping.ExactMatch(term: str)

Bases: openclean.data.mapping.StringMatch

Short cut for creating an exact string match result.

score: float
term: str
class openclean.data.mapping.Mapping(values: Optional[Dict[str:Union[str, List[StringMatch]]]] = None)

Bases: collections.defaultdict

The mapping class is a lookup dictionary that is used to maintain a mapping of values (e.g., from a data frame columns) to their nearest match (or list of nearest matches) in a controlled vocabulary.

add(key: str, matches: List[openclean.data.mapping.StringMatch]) openclean.data.mapping.Mapping

Add a list of matches to the mapped values for a given term (key). The term that is identified by the key does not have to exist in the current mapping.

Returns a reference to the mapping itself.

Parameters
  • key (string) – Term in the mapped vocabulary or that is added to the mapping.

  • matches (list of openclean.data.mapping.StringMatch) – The list of matches returned from a matcher

Return type

openclean.data.mapping.Mapping

filter(terms: Iterable[str]) openclean.data.mapping.Mapping

Get a mapping for only the terms in a given list

Returns the resulting Mapping.

Parameters

terms (Iterable of strings) – the list of keys to return from the mapper

Return type

openclean.data.mapping.Mapping

match_counts() collections.Counter

Counts the matches for each key in the map.

Returns the resulting Counter.

Return type

Counter of # of matches

matched(single_match_only: Optional[bool] = False) openclean.data.mapping.Mapping

Identifies keys with one or more than one matches in the map.

Returns the resulting Mapping.

Parameters

single_match_only (bool) – selects between keys with only one or at least one matches

Return type

openclean.data.mapping.Mapping

to_lookup(raise_error: bool = False) Dict[str, str]

Convert map into dict of key:match pairs.

Note: in case of multiple matches, it’ll ignore those keys and raise a warning.

Return type

dict of keys and their matches

Raises

RuntimeError

unmatched() set

Identifies keys that have no matches

Returns the resulting Set.

Return type

Set of keys with no matches

update(updates: Optional[Dict[str, str]] = None) openclean.data.mapping.Mapping

Lets the user update values in the map with their own values. Raises an error if the provided dictionary contains keys that are not in the current mapping.

The updated values are treated as exact matches (i.e., with a score of 1.) since they are provided by the user.

Returns a self reference.

Parameters

updates (dict) – Dictionary of type {mapping_key: exact_match}

Return type

openclean.data.mapping.Mapping

Raises

Key Error

class openclean.data.mapping.NoMatch(term: str)

Bases: openclean.data.mapping.StringMatch

Short cut for creating a no-match result for string matcher.

score: float
term: str
class openclean.data.mapping.StringMatch(term: str, score: float)

Bases: object

String matching results that contains te matched term an the match score. A score of 1. indicates a perfect match and a score of 0. a no-match.

score: float
term: str