openclean.operator.transform.rename module

Functions and classes that implement the column renaming operator in openclean.

class openclean.operator.transform.rename.Rename(columns: Union[int, str, List[Union[str, int]]], names: List[Union[str, histore.document.schema.Column]])

Bases: openclean.operator.stream.processor.StreamProcessor, openclean.operator.base.DataFrameTransformer

Data frame transformer that renames a selected list of columns in a data frame. The output is a data frame that contains all rows and columns from an input data frame but with thoses columns that are listed in the given column list being renamed with the respective value in the given names list.

open(schema: List[Union[str, histore.document.schema.Column]]) openclean.operator.stream.consumer.StreamFunctionHandler

Factory pattern for stream consumer. Returns an instance of a stream consumer that has a schema with renamed columns. The associated stream function does not manipulate any of the rows.

Parameters

schema (list of string) – List of column names in the data stream schema.

Return type

openclean.operator.stream.consumer.StreamFunctionHandler

rename(schema: List[Union[str, histore.document.schema.Column]]) List[Union[str, histore.document.schema.Column]]

Create a modified dataset schema with renamed columns.

Parameters

schema (list of string) – Dataset input schema.

Return type

list of string

transform(df)

Return a data frame that contains all rows and columns from an input data frame but with thoses columns that are listed in the given column list being renamed with the respective value in the given names list.

Parameters

df (pandas.DataFrame) – Input data frame.

Return type

pandas.DataFrame

openclean.operator.transform.rename.rename(df, columns, names)

The column rename operator returns a data frame where a given list of columns has been renamed. The renaming does not have to include all columns in the data frame. However, the given list of columns and new column names have to be of the same length.

Parameters
  • df (pandas.DataFrame) – Input data frame.

  • columns (int, string, or list(int or string)) – Single column or list of column index positions or column names.

  • names (string or list(string)) – Single name or list of names for the renamed columns.

Return type

pandas.DataFrame

Raises

ValueError