openclean.engine.log module

Log of actions that defines the history of a dataset.

class openclean.engine.log.LogEntry(descriptor: Dict, action: Optional[openclean.engine.action.OpHandle] = None, version: Optional[int] = None)

Bases: object

Entry in an operation log for a dataset. Each entry maintains information about a committed or uncommitted snapshot of a dataset. Each log entry is associated with a unique UUID identifer and a descriptor for the action that created the snapshot.

For uncommitted snapshots the handle for the action that created the snapshot is maintained together with the version identifier in the data store for the dataset sample.

action: Optional[openclean.engine.action.OpHandle] = None
descriptor: Dict
version: Optional[int] = None
class openclean.engine.log.OperationLog(snapshots: List[histore.archive.snapshot.Snapshot])

Bases: object

The operation log maintains a list of entries containing provenance information for each snapshot of a dataset. Snapshots in a dataset can either be committed, i.e., persisted with the datastore that manages the full dataset, or uncommitted, i.e., committed only with the datastore for a dataset sample but not the full dataset.

add(version: int, action: openclean.engine.action.OpHandle)

Append a record to the log.

Parameters
  • version (int) – Dataset snapshot version identifier.

  • action (openclean.engine.log.OpHandle) – Handle for the operation that created the dataset snapshot.

last_version() int

Get version identifier of the last entry in the log.

Return type

int

truncate(pos: int)

Remove all log entries starting at the given index.

Parameters

pos (int) – List position from which (including the position) all entries in the log are removed.