openclean.embedding.feature.character module
Collection of functions that compute feature valuezs for scalar cell values base on the character composition of the string representation a value.
- openclean.embedding.feature.character.digits_count(value)
Count the number of digits in the string representation of a scalar value.
- Parameters
value (scalar) – Scalar value in a data stream.
- Return type
int
- openclean.embedding.feature.character.digits_fraction(value)
Compute the fraction of characters in a string value that are digits.
- Parameters
value (scalar) – Scalar value in a data stream.
- Return type
float
- openclean.embedding.feature.character.fraction(count, value)
Divides the given character count by the length of the string representation for the given value.
- Parameters
count (int) – Character count returned by one of the count functions.
value (scalar) – Scalar value in a data stream.
- Return type
float
- openclean.embedding.feature.character.letters_count(value)
Count the number of letters in the string representation of a scalar value.
- Parameters
value (scalar) – Scalar value in a data stream.
- Return type
int
- openclean.embedding.feature.character.letters_fraction(value)
Compute the fraction of characters in a string value that are letters.
- Parameters
value (scalar) – Scalar value in a data stream.
- Return type
float
- openclean.embedding.feature.character.spec_char_count(value)
Count the number of characters in the string representation of a scalar value that are not digit, letter, or white space.
- Parameters
value (scalar) – Scalar value in a data stream.
- Return type
int
- openclean.embedding.feature.character.spec_char_fraction(value)
Compute the fraction of characters in a string value that are not digits, letters, or white space characters.
- Parameters
value (scalar) – Scalar value in a data stream.
- Return type
float
- openclean.embedding.feature.character.unique_count(value)
Count the number of unique characters in the string representation of a scalar value.
- Parameters
value (scalar) – Scalar value in a data stream.
- Return type
int
- openclean.embedding.feature.character.unique_fraction(value)
Compute the uniqueness of characters for a string value.
- Parameters
value (scalar) – Scalar value in a data stream.
- Return type
float
- openclean.embedding.feature.character.whitespace_count(value)
Count the number of white space characters in the string representation for a scalar value.
- Parameters
value (scalar) – Scalar value in a data stream.
- Return type
int
- openclean.embedding.feature.character.whitespace_fraction(value)
Compute the fraction of characters in a string value that are white space characters.
- Parameters
value (scalar) – Scalar value in a data stream.
- Return type
float