Data Enrichment
Master data using Socrata
Master data can be downloaded into openclean to enrich datasets and support the data cleaning process. In this quick example, we download the ITU ICT Development Index (IDI) from Socrata to demonstrate this.
from openclean.data.source.socrata import Socrata
idi = Socrata().dataset('3bxy-wfk9').load()
print(idi.head())
year country_id country_name sub_index value_type value
0 2015 KOR Korea (Rep.) NaN rank 1.0
1 2015 DNK Denmark NaN rank 2.0
2 2015 ISL Iceland NaN rank 3.0
3 2015 GBR United Kingdom NaN rank 4.0
4 2015 SWE Sweden NaN rank 5.0
Master data using Reference Data Repository
openclean integrates the refdata package to provides easy access to several different reference datasets that are available online for download. Reference datasets are for example a great source for lookup tables and mappings that are used in data cleaning for outlier detection and data standardization.
There are a couple of examples in the Master data guide that show how refdata package can be used to get master data for cleaning operations.
For more information, visit the official repo.