openclean.operator.transform.move module

Data frame transformation operator for sorting by data frame columns.

class openclean.operator.transform.move.MoveCols(columns: Union[int, str, List[Union[str, int]]], pos: int)

Bases: openclean.operator.stream.processor.StreamProcessor, openclean.operator.base.DataFrameTransformer

Operator to move one or more columns to a specified index position.

open(schema: List[Union[str, histore.document.schema.Column]]) openclean.operator.stream.consumer.StreamFunctionHandler

Factory pattern for stream consumer. Returns an instance of a stream consumer that re-orders values in a data stream row.

Parameters

schema (list of string) – List of column names in the data stream schema.

Return type

openclean.operator.stream.consumer.StreamFunctionHandler

reorder(schema: List[Union[str, histore.document.schema.Column]]) List[int]

Get a the order of columns in the modified data schema. The new column order is represented as a list where over the original column index positions.

Parameters

schema (list of string) – Dataset input schema.

Return type

list of int

transform(df)

Return a data frame that contains all rows but only those columns from the given input data frame that are included in the select clause.

Raises a value error if the list of columns contains an item that cannot be matched to a column in the given data frame.

Parameters

df (pandas.DataFrame) – Input data frame.

Return type

pandas.DataFrame

class openclean.operator.transform.move.MoveRows(rows, pos)

Bases: openclean.operator.base.DataFrameTransformer

Operator to move one or more rows to a specified index position.

transform(df)

Return a data frame that contains all rows but only those columns from the given input data frame that are included in the select clause.

Raises a value error if the list of columns contains an item that cannot be matched to a column in the given data frame.

Parameters

df (pandas.DataFrame) – Input data frame.

Return type

pandas.DataFrame

openclean.operator.transform.move.move_rows(df: pandas.core.frame.DataFrame, rowids: Union[int, List[int]], pos: int)

Move one or more rows in a data frame to a given position.

Parameters
  • df (pandas.DataFrame) – Input data frame.

  • rows (int or list(int)) – Identifier of rows that are being moved.

  • pos (int) – Insert position for the moved columns.

Return type

pandas.DataFrame

Raises

ValueError

openclean.operator.transform.move.movecols(df: pandas.core.frame.DataFrame, columns: Union[int, str, List[Union[str, int]]], pos: int)

Move one or more columns in a data frame to a given position.

Parameters
  • df (pandas.DataFrame) – Input data frame.

  • columns (int, string, or list(int or string)) – Single column or list of column index positions or column names.

  • pos (int) – Insert position for the moved columns.

Return type

pandas.DataFrame

Raises

ValueError