openclean.data.source.socrata module

Data repository for accessing datasets via the Socrata Open Data API.

class openclean.data.source.socrata.SODADataset(doc: Dict, app_token: Optional[str] = None)

Bases: refdata.base.DatasetDescriptor

Handle for a SODA dataset.

load() pandas.core.frame.DataFrame

Download the dataset as a pandas data frame.

Return type

pd.DataFrame

write(file: IO)

Write the dataset to the given file. The output file format is a tab-delimited csv file with the column names as the first line.

Parameters

file (file object) – File-like object that provides a write method.

class openclean.data.source.socrata.Socrata(app_token: Optional[str] = None)

Bases: refdata.base.Descriptor

Repository handle for the Socrata Open Data API.

catalog(domain: Optional[str] = None) Iterable[openclean.data.source.socrata.SODADataset]

Generator for a listing of all datasets that are available from the repository. Provides to option to filter datasets by their domain.

Parameters

domain (string, optional=None) – Optional domain name filter for returned datasets.

Return type

iterable of openclean.data.source.socrata.SODADataset

dataset(identifier: str) openclean.data.source.socrata.SODADataset

Get the handle for the dataset with the given identifier.

Parameters

identifier (string) – Unique dataset identifier.

Return type

openclean.data.source.socrata.SODADataset

domains(filter: Optional[str] = None) List[Tuple[str, str]]

Get a list of domain names that are available from the Socrata Open Data API. Returns a list of tuples with catalog Url and the domain name.

If the domain filter is given only the domain that matches the filter will be returned.

Return type

list of tuples of string and string