openclean.data.mapping module
- class openclean.data.mapping.ExactMatch(term: str)
Bases:
openclean.data.mapping.StringMatch
Short cut for creating an exact string match result.
- score: float
- term: str
- class openclean.data.mapping.Mapping(values: Optional[Dict[str:Union[str, List[StringMatch]]]] = None)
Bases:
collections.defaultdict
The mapping class is a lookup dictionary that is used to maintain a mapping of values (e.g., from a data frame columns) to their nearest match (or list of nearest matches) in a controlled vocabulary.
- add(key: str, matches: List[openclean.data.mapping.StringMatch]) openclean.data.mapping.Mapping
Add a list of matches to the mapped values for a given term (key). The term that is identified by the key does not have to exist in the current mapping.
Returns a reference to the mapping itself.
- Parameters
key (string) – Term in the mapped vocabulary or that is added to the mapping.
matches (list of openclean.data.mapping.StringMatch) – The list of matches returned from a matcher
- Return type
- filter(terms: Iterable[str]) openclean.data.mapping.Mapping
Get a mapping for only the terms in a given list
Returns the resulting Mapping.
- Parameters
terms (Iterable of strings) – the list of keys to return from the mapper
- Return type
- match_counts() collections.Counter
Counts the matches for each key in the map.
Returns the resulting Counter.
- Return type
Counter of # of matches
- matched(single_match_only: Optional[bool] = False) openclean.data.mapping.Mapping
Identifies keys with one or more than one matches in the map.
Returns the resulting Mapping.
- Parameters
single_match_only (bool) – selects between keys with only one or at least one matches
- Return type
- to_lookup(raise_error: bool = False) Dict[str, str]
Convert map into dict of key:match pairs.
Note: in case of multiple matches, it’ll ignore those keys and raise a warning.
- Return type
dict of keys and their matches
- Raises
RuntimeError –
- unmatched() set
Identifies keys that have no matches
Returns the resulting Set.
- Return type
Set of keys with no matches
- update(updates: Optional[Dict[str, str]] = None) openclean.data.mapping.Mapping
Lets the user update values in the map with their own values. Raises an error if the provided dictionary contains keys that are not in the current mapping.
The updated values are treated as exact matches (i.e., with a score of 1.) since they are provided by the user.
Returns a self reference.
- Parameters
updates (dict) – Dictionary of type {mapping_key: exact_match}
- Return type
- Raises
Key Error –
- class openclean.data.mapping.NoMatch(term: str)
Bases:
openclean.data.mapping.StringMatch
Short cut for creating a no-match result for string matcher.
- score: float
- term: str