`ForeTiS.preprocess.raw_data_functions`

Module Contents

Functions

`drop_columns`(df, columns)	Function dropping all columns specified
`drop_rows_by_dates`(df, start, end)	Function dropping rows within specified dates
`custom_resampler`(arraylike, target_column)	Custom resampling function when resampling frequency of dataset
`get_one_hot_encoded_df`(df, columns_to_encode)	Function delivering dataframe with specified columns one hot encoded
`get_simple_imputer`(df[, strategy])	Get simple imputer for each column according to specified strategy
`get_iter_imputer`(df[, sample_posterior, max_iter, ...])	Multivariate, iterative imputer fitted to df with specified parameters
`get_knn_imputer`(df[, n_neighbors])	Imputer of missing values according to k-nearest neighbors in feature space
`encode_cyclical_features`(df, columns)	Function that encodes the cyclic features to sinus and cosinus distribution

ForeTiS.preprocess.raw_data_functions.drop_columns(df, columns)

Function dropping all columns specified

Parameters:

df (pandas.DataFrame) – dataset used for dropping
columns (list) – columns which should be dropped

ForeTiS.preprocess.raw_data_functions.drop_rows_by_dates(df, start, end)

Function dropping rows within specified dates

Parameters:

df (pandas.DataFrame) – dataset used for dropping
start (datetime.date) – start date for dropped period
end (datetime.date) – end date for dropped period

ForeTiS.preprocess.raw_data_functions.custom_resampler(arraylike, target_column)

Custom resampling function when resampling frequency of dataset

Parameters:

arraylike (pandas.Series) – Series to use for calculation
target_column (str) – choosen target column

Returns:

sum or mean of arraylike or 1

ForeTiS.preprocess.raw_data_functions.get_one_hot_encoded_df(df, columns_to_encode)

Function delivering dataframe with specified columns one hot encoded

Parameters:

df (pandas.DataFrame) – dataset to use for encoding
columns_to_encode (list) – columns to encode

Returns:

dataset with encoded columns

Return type:

pandas.DataFrame

ForeTiS.preprocess.raw_data_functions.get_simple_imputer(df, strategy='mean')

Get simple imputer for each column according to specified strategy

Parameters:

df (pandas.DataFrame) – DataFrame to impute
strategy (str) – strategy to use, e.g. ‘mean’ or ‘median’

Returns:

imputer

Return type:

sklearn.impute.SimpleImputer

ForeTiS.preprocess.raw_data_functions.get_iter_imputer(df, sample_posterior=True, max_iter=100, min_value=0, max_value=None)

Multivariate, iterative imputer fitted to df with specified parameters

Parameters:

df (pandas.DataFrame) – DataFrame to fit for imputation
sample_posterior (bool) – sample from predictive posterior of fitted estimator (standard: BayesianRidge())
max_iter (int) – maximum number of iterations for imputation
min_value (int) – min value for imputation
max_value (int) – max value for imputation

Returns:

imputer

Return type:

sklearn.impute.IterativeImputer

ForeTiS.preprocess.raw_data_functions.get_knn_imputer(df, n_neighbors=10)

Imputer of missing values according to k-nearest neighbors in feature space

Parameters:

df (pandas.DataFrame) – DataFrame to use for imputation
n_neighbors (int) – number of neighbors to use for imputation

Returns:

imputer

Return type:

sklearn.impute.KNNImputer

ForeTiS.preprocess.raw_data_functions.encode_cyclical_features(df, columns)

Function that encodes the cyclic features to sinus and cosinus distribution

Parameters:

df (pandas.DataFrame) – DataFrame to use for imputation
columns (list) – columns that should be encoded

ForeTiS.preprocess.raw_data_functions

Module Contents

Functions

`ForeTiS.preprocess.raw_data_functions`