openclean.operator.transform.move module
Data frame transformation operator for sorting by data frame columns.
- class openclean.operator.transform.move.MoveCols(columns: Union[int, str, List[Union[str, int]]], pos: int)
Bases:
openclean.operator.stream.processor.StreamProcessor
,openclean.operator.base.DataFrameTransformer
Operator to move one or more columns to a specified index position.
- open(schema: List[Union[str, histore.document.schema.Column]]) openclean.operator.stream.consumer.StreamFunctionHandler
Factory pattern for stream consumer. Returns an instance of a stream consumer that re-orders values in a data stream row.
- Parameters
schema (list of string) – List of column names in the data stream schema.
- Return type
- reorder(schema: List[Union[str, histore.document.schema.Column]]) List[int]
Get a the order of columns in the modified data schema. The new column order is represented as a list where over the original column index positions.
- Parameters
schema (list of string) – Dataset input schema.
- Return type
list of int
- transform(df)
Return a data frame that contains all rows but only those columns from the given input data frame that are included in the select clause.
Raises a value error if the list of columns contains an item that cannot be matched to a column in the given data frame.
- Parameters
df (pandas.DataFrame) – Input data frame.
- Return type
pandas.DataFrame
- class openclean.operator.transform.move.MoveRows(rows, pos)
Bases:
openclean.operator.base.DataFrameTransformer
Operator to move one or more rows to a specified index position.
- transform(df)
Return a data frame that contains all rows but only those columns from the given input data frame that are included in the select clause.
Raises a value error if the list of columns contains an item that cannot be matched to a column in the given data frame.
- Parameters
df (pandas.DataFrame) – Input data frame.
- Return type
pandas.DataFrame
- openclean.operator.transform.move.move_rows(df: pandas.core.frame.DataFrame, rowids: Union[int, List[int]], pos: int)
Move one or more rows in a data frame to a given position.
- Parameters
df (pandas.DataFrame) – Input data frame.
rows (int or list(int)) – Identifier of rows that are being moved.
pos (int) – Insert position for the moved columns.
- Return type
pandas.DataFrame
- Raises
ValueError –
- openclean.operator.transform.move.movecols(df: pandas.core.frame.DataFrame, columns: Union[int, str, List[Union[str, int]]], pos: int)
Move one or more columns in a data frame to a given position.
- Parameters
df (pandas.DataFrame) – Input data frame.
columns (int, string, or list(int or string)) – Single column or list of column index positions or column names.
pos (int) – Insert position for the moved columns.
- Return type
pandas.DataFrame
- Raises
ValueError –