openclean.operator.transform.insert module
Data frame transformation operator that inserts new columns and rows into a data frame.
- class openclean.operator.transform.insert.InsCol(names: Union[str, List[str]], pos: Optional[int] = None, values: Optional[Union[Callable, openclean.function.eval.base.EvalFunction, List, int, float, str, datetime.datetime, Tuple]] = None)
Bases:
openclean.operator.stream.processor.StreamProcessor
,openclean.operator.base.DataFrameTransformer
Data frame transformer that inserts columns into a data frame. Values for the new column(s) are generated using a given value generator function.
- inspos(schema: List[Union[str, histore.document.schema.Column]]) int
Get the insert position for the new column.
Raises a ValueError if the position is invalid.
- Parameters
schema (list of string) – Dataset input schema.
- Return type
int
- open(schema: List[Union[str, histore.document.schema.Column]]) openclean.operator.stream.consumer.StreamFunctionHandler
Factory pattern for stream consumer. Returns an instance of a stream consumer that re-orders values in a data stream row.
- Parameters
schema (list of string) – List of column names in the data stream schema.
- Return type
- transform(df)
Modify rows in the given data frame. Returns a modified data frame where columns have been inserted containing results of evaluating the associated value generator function.
- Parameters
df (pandas.DataFrame) – Input data frame.
- Return type
pandas.DataFrame
- Raises
ValueError –
- class openclean.operator.transform.insert.InsRow(pos=None, values=None)
Bases:
openclean.operator.base.DataFrameTransformer
Data frame transformer that inserts rows into a data frame. If values is None a single row with all None values will be inserted. Ir values is a list of lists multiple rows will be inserted.
- transform(df)
Insert rows in the given data frame. Returns a modified data frame where rows have been added. Raises a ValueError if the specified insert position is invalid or the number of values that are inserted does not match the schema of the given data frame.
- Parameters
df (pandas.DataFrame) – Input data frame.
- Return type
pandas.DataFrame
- Raises
ValueError –
- openclean.operator.transform.insert.inscol(df: pandas.core.frame.DataFrame, names: Union[str, List[str]], pos: Optional[int] = None, values: Optional[Union[int, float, str, datetime.datetime, openclean.function.eval.base.EvalFunction]] = None) pandas.core.frame.DataFrame
Insert function for data frame columns. Returns a modified data frame where columns have been inserted at a given position. Exactly one column is inserted for each given column name. If the insert position is undefined, columns are appended to the data frame. If the position does not reference a valid position (i.e., not between 0 and len(df.columns)) a ValueError is raised.
Values for the inserted columns are generated using a given constant value or evaluation function. If a function is given, it is expected to return exactly one value (e.g., a tuple of len(names)) for each of the inserted columns.
- Parameters
df (pd.DataFrame) – Input data frame.
names (string, or list(string)) – Names of the inserted columns.
pos (int, default=None) – Insert position for the new columns. If None, the columns will be appended.
values (scalar, tuple, or openclean.function.eval.base.EvalFunction,) – default=None Single value, tuple of values, or evaluation function that is used to generate the values for the inserted column(s). If no default is specified all columns will contain None.
- Return type
pd.DataFrame
- openclean.operator.transform.insert.insrow(df, pos=None, values=None)
Insert a row into a data frame at a specified position. If the list of row values is given there has to be exactly one value per column in the data frame.
- Parameters
df (pandas.DataFrame) – Input data frame.
pos (int, optional) – Insert position for the new row(s). If None, the rows will be appended.
values (list, optional) – List or values (to insert one row) or list of lists (to insert multiple rows).
- Return type
pandas.DataFrame