ForeTiS.preprocess.raw_data_functions

Module Contents

Functions

drop_columns(df, columns)

Function dropping all columns specified

drop_rows_by_dates(df, start, end)

Function dropping rows within specified dates

custom_resampler(arraylike, target_column)

Custom resampling function when resampling frequency of dataset

get_one_hot_encoded_df(df, columns_to_encode)

Function delivering dataframe with specified columns one hot encoded

get_simple_imputer(df[, strategy])

Get simple imputer for each column according to specified strategy

get_iter_imputer(df[, sample_posterior, max_iter, ...])

Multivariate, iterative imputer fitted to df with specified parameters

get_knn_imputer(df[, n_neighbors])

Imputer of missing values according to k-nearest neighbors in feature space

encode_cyclical_features(df, columns)

Function that encodes the cyclic features to sinus and cosinus distribution

ForeTiS.preprocess.raw_data_functions.drop_columns(df, columns)

Function dropping all columns specified

Parameters:
  • df (pandas.DataFrame) – dataset used for dropping

  • columns (list) – columns which should be dropped

ForeTiS.preprocess.raw_data_functions.drop_rows_by_dates(df, start, end)

Function dropping rows within specified dates

Parameters:
  • df (pandas.DataFrame) – dataset used for dropping

  • start (datetime.date) – start date for dropped period

  • end (datetime.date) – end date for dropped period

ForeTiS.preprocess.raw_data_functions.custom_resampler(arraylike, target_column)

Custom resampling function when resampling frequency of dataset

Parameters:
  • arraylike (pandas.Series) – Series to use for calculation

  • target_column (str) – choosen target column

Returns:

sum or mean of arraylike or 1

ForeTiS.preprocess.raw_data_functions.get_one_hot_encoded_df(df, columns_to_encode)

Function delivering dataframe with specified columns one hot encoded

Parameters:
  • df (pandas.DataFrame) – dataset to use for encoding

  • columns_to_encode (list) – columns to encode

Returns:

dataset with encoded columns

Return type:

pandas.DataFrame

ForeTiS.preprocess.raw_data_functions.get_simple_imputer(df, strategy='mean')

Get simple imputer for each column according to specified strategy

Parameters:
  • df (pandas.DataFrame) – DataFrame to impute

  • strategy (str) – strategy to use, e.g. ‘mean’ or ‘median’

Returns:

imputer

Return type:

sklearn.impute.SimpleImputer

ForeTiS.preprocess.raw_data_functions.get_iter_imputer(df, sample_posterior=True, max_iter=100, min_value=0, max_value=None)

Multivariate, iterative imputer fitted to df with specified parameters

Parameters:
  • df (pandas.DataFrame) – DataFrame to fit for imputation

  • sample_posterior (bool) – sample from predictive posterior of fitted estimator (standard: BayesianRidge())

  • max_iter (int) – maximum number of iterations for imputation

  • min_value (int) – min value for imputation

  • max_value (int) – max value for imputation

Returns:

imputer

Return type:

sklearn.impute.IterativeImputer

ForeTiS.preprocess.raw_data_functions.get_knn_imputer(df, n_neighbors=10)

Imputer of missing values according to k-nearest neighbors in feature space

Parameters:
  • df (pandas.DataFrame) – DataFrame to use for imputation

  • n_neighbors (int) – number of neighbors to use for imputation

Returns:

imputer

Return type:

sklearn.impute.KNNImputer

ForeTiS.preprocess.raw_data_functions.encode_cyclical_features(df, columns)

Function that encodes the cyclic features to sinus and cosinus distribution

Parameters:
  • df (pandas.DataFrame) – DataFrame to use for imputation

  • columns (list) – columns that should be encoded