openclean.embedding.feature.character module

Collection of functions that compute feature valuezs for scalar cell values base on the character composition of the string representation a value.

openclean.embedding.feature.character.digits_count(value)

Count the number of digits in the string representation of a scalar value.

Parameters

value (scalar) – Scalar value in a data stream.

Return type

int

openclean.embedding.feature.character.digits_fraction(value)

Compute the fraction of characters in a string value that are digits.

Parameters

value (scalar) – Scalar value in a data stream.

Return type

float

openclean.embedding.feature.character.fraction(count, value)

Divides the given character count by the length of the string representation for the given value.

Parameters
  • count (int) – Character count returned by one of the count functions.

  • value (scalar) – Scalar value in a data stream.

Return type

float

openclean.embedding.feature.character.letters_count(value)

Count the number of letters in the string representation of a scalar value.

Parameters

value (scalar) – Scalar value in a data stream.

Return type

int

openclean.embedding.feature.character.letters_fraction(value)

Compute the fraction of characters in a string value that are letters.

Parameters

value (scalar) – Scalar value in a data stream.

Return type

float

openclean.embedding.feature.character.spec_char_count(value)

Count the number of characters in the string representation of a scalar value that are not digit, letter, or white space.

Parameters

value (scalar) – Scalar value in a data stream.

Return type

int

openclean.embedding.feature.character.spec_char_fraction(value)

Compute the fraction of characters in a string value that are not digits, letters, or white space characters.

Parameters

value (scalar) – Scalar value in a data stream.

Return type

float

openclean.embedding.feature.character.unique_count(value)

Count the number of unique characters in the string representation of a scalar value.

Parameters

value (scalar) – Scalar value in a data stream.

Return type

int

openclean.embedding.feature.character.unique_fraction(value)

Compute the uniqueness of characters for a string value.

Parameters

value (scalar) – Scalar value in a data stream.

Return type

float

openclean.embedding.feature.character.whitespace_count(value)

Count the number of white space characters in the string representation for a scalar value.

Parameters

value (scalar) – Scalar value in a data stream.

Return type

int

openclean.embedding.feature.character.whitespace_fraction(value)

Compute the fraction of characters in a string value that are white space characters.

Parameters

value (scalar) – Scalar value in a data stream.

Return type

float