openclean.embedding.feature.default module
Default feature embedding for strings.
- class openclean.embedding.feature.default.StandardEmbedding
Bases:
openclean.embedding.feature.base.FeatureEmbedding
Instance of the feature embedding function that uses a default set of seven value features to compute feature vectors. The computed features are: - normalized value length - normalized value frequency - uniqueness of characters in the value string - fraction of letter characters in the value string - fraction of digits in the value string - fraction of speical characters in the value string (not digit, letter, or
whitespace)
fraction of whitespace characters in the value string
- class openclean.embedding.feature.default.UniqueSetEmbedding
Bases:
openclean.embedding.feature.base.FeatureEmbedding
Instance of the feature embedding function for nique value stes. This embedding ignores value frequencies. It uses a set of six value features to compute feature vectors. The computed features are: - normalized value length - uniqueness of characters in the value string - fraction of letter characters in the value string - fraction of digits in the value string - fraction of speical characters in the value string (not digit, letter, or
whitespace)
fraction of whitespace characters in the value string