openclean.data.metadata.base module

Base classes for metadata stores and store factories.

class openclean.data.metadata.base.MetadataStore

Bases: object

Abstract class for metadata stores that maintain annotations for individual snapshots (datasets) in an archive.

delete_annotation(key: str, column_id: Optional[int] = None, row_id: Optional[int] = None)

Delete annotation with the given key for the object that is identified by the given combination of column and row identfier.

Parameters
  • key (string) – Unique annotation key.

  • column_id (int, default=None) – Column identifier for the referenced object (None for rows or full datasets).

  • row_id (int, default=None) – Row identifier for the referenced object (None for columns or full datasets).

get_annotation(key: str, column_id: Optional[int] = None, row_id: Optional[int] = None, default_value: Optional[Any] = None) Any

Get annotation with the given key for the identified object. Returns the default vlue if no annotation with the given ey exists for the object.

Parameters
  • key (string) – Unique annotation key.

  • column_id (int, default=None) – Column identifier for the referenced object (None for rows or full datasets).

  • row_id (int, default=None) – Row identifier for the referenced object (None for columns or full datasets).

  • default_value (any, default=None) – Default value that is returned if no annotation with the given key exists for the identified object.

Return type

Any

has_annotation(key: str, column_id: Optional[int] = None, row_id: Optional[int] = None) bool

Test if an annotation with the given key exists for the identified object.

Parameters
  • key (string) – Unique annotation key.

  • column_id (int, default=None) – Column identifier for the referenced object (None for rows or full datasets).

  • row_id (int, default=None) – Row identifier for the referenced object (None for columns or full datasets).

Return type

bool

list_annotations(column_id: Optional[int] = None, row_id: Optional[int] = None) Dict

Get all annotations for an identified object as a key,value-pair dictionary.

Parameters
  • column_id (int, default=None) – Column identifier for the referenced object (None for rows or full datasets).

  • row_id (int, default=None) – Row identifier for the referenced object (None for columns or full datasets).

abstract read(column_id: Optional[int] = None, row_id: Optional[int] = None) Dict

Read the annotation dictionary for the specified object.

Parameters
  • column_id (int, default=None) – Column identifier for the referenced object (None for rows or full datasets).

  • row_id (int, default=None) – Row identifier for the referenced object (None for columns or full datasets).

Return type

dict

set_annotation(key: str, value: Any, column_id: Optional[int] = None, row_id: Optional[int] = None)

Set annotation value for an identified object.

Parameters
  • key (string) – Unique annotation key.

  • value (any) – Value that will be associated with the given key.

  • column_id (int, default=None) – Column identifier for the referenced object (None for rows or full datasets).

  • row_id (int, default=None) – Row identifier for the referenced object (None for columns or full datasets).

abstract write(doc: Dict, column_id: Optional[int] = None, row_id: Optional[int] = None)

Write the annotation dictionary for the specified object.

Parameters
  • doc (dict) – Annotation dictionary that is being written to file.

  • column_id (int, default=None) – Column identifier for the referenced object (None for rows or full datasets).

  • row_id (int, default=None) – Row identifier for the referenced object (None for columns or full datasets).

Return type

dict

class openclean.data.metadata.base.MetadataStoreFactory

Bases: object

Factory pattern for metadata stores. Metadata stores are created on a per-version basis. That is, each dataset snapshot has its own idependent metadata store.

abstract get_store(version: int) openclean.data.metadata.base.MetadataStore

Get the metadata store for the dataset snapshot with the given version identifier.

Parameters

version (int) – Unique version identifier

Return type

openclean.data.metadata.base.MetadataStore

abstract rollback(version: int)

Remove metadata for all dataset versions that are after the given rollback version.

Parameters

version (int) – Unique identifier of the rollback version.