openclean.data.metadata.base module
Base classes for metadata stores and store factories.
- class openclean.data.metadata.base.MetadataStore
Bases:
object
Abstract class for metadata stores that maintain annotations for individual snapshots (datasets) in an archive.
- delete_annotation(key: str, column_id: Optional[int] = None, row_id: Optional[int] = None)
Delete annotation with the given key for the object that is identified by the given combination of column and row identfier.
- Parameters
key (string) – Unique annotation key.
column_id (int, default=None) – Column identifier for the referenced object (None for rows or full datasets).
row_id (int, default=None) – Row identifier for the referenced object (None for columns or full datasets).
- get_annotation(key: str, column_id: Optional[int] = None, row_id: Optional[int] = None, default_value: Optional[Any] = None) Any
Get annotation with the given key for the identified object. Returns the default vlue if no annotation with the given ey exists for the object.
- Parameters
key (string) – Unique annotation key.
column_id (int, default=None) – Column identifier for the referenced object (None for rows or full datasets).
row_id (int, default=None) – Row identifier for the referenced object (None for columns or full datasets).
default_value (any, default=None) – Default value that is returned if no annotation with the given key exists for the identified object.
- Return type
Any
- has_annotation(key: str, column_id: Optional[int] = None, row_id: Optional[int] = None) bool
Test if an annotation with the given key exists for the identified object.
- Parameters
key (string) – Unique annotation key.
column_id (int, default=None) – Column identifier for the referenced object (None for rows or full datasets).
row_id (int, default=None) – Row identifier for the referenced object (None for columns or full datasets).
- Return type
bool
- list_annotations(column_id: Optional[int] = None, row_id: Optional[int] = None) Dict
Get all annotations for an identified object as a key,value-pair dictionary.
- Parameters
column_id (int, default=None) – Column identifier for the referenced object (None for rows or full datasets).
row_id (int, default=None) – Row identifier for the referenced object (None for columns or full datasets).
- abstract read(column_id: Optional[int] = None, row_id: Optional[int] = None) Dict
Read the annotation dictionary for the specified object.
- Parameters
column_id (int, default=None) – Column identifier for the referenced object (None for rows or full datasets).
row_id (int, default=None) – Row identifier for the referenced object (None for columns or full datasets).
- Return type
dict
- set_annotation(key: str, value: Any, column_id: Optional[int] = None, row_id: Optional[int] = None)
Set annotation value for an identified object.
- Parameters
key (string) – Unique annotation key.
value (any) – Value that will be associated with the given key.
column_id (int, default=None) – Column identifier for the referenced object (None for rows or full datasets).
row_id (int, default=None) – Row identifier for the referenced object (None for columns or full datasets).
- abstract write(doc: Dict, column_id: Optional[int] = None, row_id: Optional[int] = None)
Write the annotation dictionary for the specified object.
- Parameters
doc (dict) – Annotation dictionary that is being written to file.
column_id (int, default=None) – Column identifier for the referenced object (None for rows or full datasets).
row_id (int, default=None) – Row identifier for the referenced object (None for columns or full datasets).
- Return type
dict
- class openclean.data.metadata.base.MetadataStoreFactory
Bases:
object
Factory pattern for metadata stores. Metadata stores are created on a per-version basis. That is, each dataset snapshot has its own idependent metadata store.
- abstract get_store(version: int) openclean.data.metadata.base.MetadataStore
Get the metadata store for the dataset snapshot with the given version identifier.
- Parameters
version (int) – Unique version identifier
- Return type
- abstract rollback(version: int)
Remove metadata for all dataset versions that are after the given rollback version.
- Parameters
version (int) – Unique identifier of the rollback version.