openclean.operator.stream.matching module
- class openclean.operator.stream.matching.BestMatches(matcher: openclean.function.matching.base.StringMatcher, include_vocab: Optional[bool] = False, mapping: Optional[openclean.data.mapping.Mapping] = None)
Bases:
openclean.operator.stream.consumer.StreamConsumer
,openclean.operator.stream.processor.StreamProcessor
- close() openclean.data.mapping.Mapping
Return the collected mapping at the end of the stream.
- Return type
- consume(rowid: int, row: List[Union[int, float, str, datetime.datetime]]) List[Union[int, float, str, datetime.datetime]]
Consume the given row. Assumes that the row contains exactly one column.
- Parameters
values (iterable of strings) – List of terms (e.g., from a data frame column) for which matches are computed for the returned mapping.
matcher (openclean.function.matching.base.VocabularyMatcher) – Matcher to compute matches for the terms in a controlled vocabulary.
include_vocab (bool, default=False) – If this flag is False the resulting mapping will only contain matches for terms that are not in the vocabulary that is associated with the given matcher.
- Return type
- open(schema: List[Union[str, histore.document.schema.Column]]) openclean.operator.stream.consumer.StreamConsumer
Factory pattern for stream consumer. Returns an instance of the best matches consumer.
Raises a ValueError if the given schema contains more than one column.
- Parameters
schema (list of string) – List of column names in the data stream schema.
- Return type
- Raises
ValueError –