Data Enrichment

Master data using Socrata

Master data can be downloaded into openclean to enrich datasets and support the data cleaning process. In this quick example, we download the ITU ICT Development Index (IDI) from Socrata to demonstrate this.

from openclean.data.source.socrata import Socrata

idi = Socrata().dataset('3bxy-wfk9').load()

print(idi.head())
   year country_id    country_name sub_index value_type  value
0  2015        KOR    Korea (Rep.)       NaN       rank    1.0
1  2015        DNK         Denmark       NaN       rank    2.0
2  2015        ISL         Iceland       NaN       rank    3.0
3  2015        GBR  United Kingdom       NaN       rank    4.0
4  2015        SWE          Sweden       NaN       rank    5.0

Master data using Reference Data Repository

openclean integrates the refdata package to provides easy access to several different reference datasets that are available online for download. Reference datasets are for example a great source for lookup tables and mappings that are used in data cleaning for outlier detection and data standardization.

There are a couple of examples in the Master data guide that show how refdata package can be used to get master data for cleaning operations.

For more information, visit the official repo.