openclean.operator.transform.select module
Functions and classes that implement the column selection operator in openclean.
- class openclean.operator.transform.select.Select(columns: Union[int, str, List[Union[str, int]]], names: Optional[Union[str, List[str]]] = None)
Bases:
openclean.operator.stream.processor.StreamProcessor
,openclean.operator.base.DataFrameTransformer
Data frame transformer that selects a list of columns from a data frame. The output is a data frame that contains all rows from an input data frame but only those columns that are included in a given select clause.
- open(schema: List[Union[str, histore.document.schema.Column]]) openclean.operator.stream.consumer.StreamFunctionHandler
Factory pattern for stream consumer. Returns an instance of a stream consumer that filters columns from data frame rows using the associated list of columns (i.e., the select clause).
- Parameters
schema (list of string) – List of column names in the data stream schema.
- Return type
- transform(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Return a data frame that contains all rows but only those columns from the given input data frame that are included in the select clause.
Raises a value error if the list of columns contains an item that cannot be matched to a column in the given data frame.
- Parameters
df (pandas.DataFrame) – Input data frame.
- Return type
pandas.DataFrame
- Raises
ValueError –
- openclean.operator.transform.select.select(df: pandas.core.frame.DataFrame, columns: Union[int, str, List[Union[str, int]]], names: Optional[Union[str, List[str]]] = None) pandas.core.frame.DataFrame
Projection operator that selects a list of columns from a data frame. Returns a data frame that contains only thoses columns that are included in the given select clause. The optional list of names allows to rename the columns in the resulting data frame. If the list of names is given, it has to be of the same length as the list of columns.
- Parameters
df (pandas.DataFrame) – Input data frame.
columns (int, string, or list(int or string)) – Single column or list of column index positions or column names.
names (string or list(string)) – Single name or list of names for the resulting columns.
- Return type
pandas.DataFrame
- Raises
ValueError –