openclean.operator.transform.rename module
Functions and classes that implement the column renaming operator in openclean.
- class openclean.operator.transform.rename.Rename(columns: Union[int, str, List[Union[str, int]]], names: List[Union[str, histore.document.schema.Column]])
Bases:
openclean.operator.stream.processor.StreamProcessor
,openclean.operator.base.DataFrameTransformer
Data frame transformer that renames a selected list of columns in a data frame. The output is a data frame that contains all rows and columns from an input data frame but with thoses columns that are listed in the given column list being renamed with the respective value in the given names list.
- open(schema: List[Union[str, histore.document.schema.Column]]) openclean.operator.stream.consumer.StreamFunctionHandler
Factory pattern for stream consumer. Returns an instance of a stream consumer that has a schema with renamed columns. The associated stream function does not manipulate any of the rows.
- Parameters
schema (list of string) – List of column names in the data stream schema.
- Return type
- rename(schema: List[Union[str, histore.document.schema.Column]]) List[Union[str, histore.document.schema.Column]]
Create a modified dataset schema with renamed columns.
- Parameters
schema (list of string) – Dataset input schema.
- Return type
list of string
- transform(df)
Return a data frame that contains all rows and columns from an input data frame but with thoses columns that are listed in the given column list being renamed with the respective value in the given names list.
- Parameters
df (pandas.DataFrame) – Input data frame.
- Return type
pandas.DataFrame
- openclean.operator.transform.rename.rename(df, columns, names)
The column rename operator returns a data frame where a given list of columns has been renamed. The renaming does not have to include all columns in the data frame. However, the given list of columns and new column names have to be of the same length.
- Parameters
df (pandas.DataFrame) – Input data frame.
columns (int, string, or list(int or string)) – Single column or list of column index positions or column names.
names (string or list(string)) – Single name or list of names for the renamed columns.
- Return type
pandas.DataFrame
- Raises
ValueError –