ForeTiS.optimization.optuna_optim

Module Contents

Classes

OptunaOptim

Class that contains all info for the whole optimization using optuna for one model and dataset.

class ForeTiS.optimization.optuna_optim.OptunaOptim(save_dir, data, config_file_section, featureset_name, datasplit, test_set_size_percentage, val_set_size_percentage, n_trials, save_final_model, batch_size, n_epochs, current_model_name, datasets, periodical_refit_frequency, refit_drops, refit_window, intermediate_results_interval, pca_transform, config, optimize_featureset, scale_thr, scale_seasons, scale_window_factor, cf_r, cf_order, cf_smooth, cf_thr_perc, scale_window_minimum, max_samples_factor, valtest_seasons, seasonal_valtest, n_splits, config_model_featureset)

Class that contains all info for the whole optimization using optuna for one model and dataset.

** Attributes **

  • study (optuna.study.Study): optuna study for optimization run

  • current_best_val_result (float): the best validation result so far

  • early_stopping_point (int): point at which early stopping occured (relevant for some models)

  • seasonal_periods (int): number of samples in one season of the used dataset

  • target_column (str): target column for which predictions shall be made

  • best_trials (list): list containing the numbers of the best trials

  • user_input_params (dict): all params handed over to the constructor that are needed in the whole class

  • base_path (str): base_path for save_path

  • save_path (str): path for model and results storing

Parameters:
  • save_dir (pathlib.Path) – directory for saving the results

  • data (str) – the dataset that you want to use

  • config_file_section (str) – the section of the config file for the used dataset

  • featureset_name (str) – name of the feature set used

  • datasplit (str) – the used datasplit method, either ‘timeseries-cv’, ‘train-val-test’, ‘cv’

  • test_set_size_percentage (int) – size of the test set relevant for cv-test and train-val-test

  • val_set_size_percentage (int) – size of the validation set relevant for train-val-test

  • n_trials (int) – number of trials for optuna

  • save_final_model (bool) – specify if the final model should be saved

  • batch_size (int) – batch size for neural network models

  • n_epochs (int) – number of epochs for neural network models

  • current_model_name (str) – name of the current model according to naming of .py file in package model

  • datasets (ForeTiS.preprocess.base_dataset.Dataset) – the Dataset class containing the feature sets

  • periodical_refit_frequency (list) – if and for which intervals periodical refitting should be performed

  • refit_drops (int) – after how many periods the model should get updated

  • refit_window (int) – seasons get used for refitting

  • intermediate_results_interval (int) – number of trials after which intermediate results will be saved

  • pca_transform (bool) – whether pca dimensionality reduction will be optimized or not

  • config (configparser.RawConfigParser) – the information from dataset_specific_config.ini

  • optimize_featureset (bool) – whether feature set will be optimized or not output scale threshold

  • scale_thr (float) – only relevant for evars-gpr: output scale threshold

  • scale_seasons (int) – only relevant for evars-gpr: output scale seasons taken into account

  • scale_window_factor (float) – only relevant for evars-gpr: scale window factor based on seasonal periods

  • cf_r (float) – only relevant for evars-gpr: changefinders r param (decay factor older values)

  • cf_order (int) – only relevant for evars-gpr: changefinders SDAR model order param

  • cf_smooth (int) – only relevant for evars-gpr: changefinders smoothing param

  • cf_thr_perc (int) – only relevant for evars-gpr: percentile of train set anomaly factors as threshold for cpd with changefinder

  • scale_window_minimum (int) – only relevant for evars-gpr: scale window minimum

  • max_samples_factor (int) – only relevant for evars-gpr: max samples factor of seasons to keep for gpr pipeline

  • valtest_seasons (int) – define the number of seasons to be used when seasonal_valtest is True

  • seasonal_valtest (bool) – whether validation and test sets should be a multiple of the season length

  • n_splits (int) – splits to use for ‘timeseries-cv’ or ‘cv’

  • config_model_featureset (configparser.RawConfigParser) –

create_new_study()

Create a new optuna study.

Returns:

a new optuna study instance

Return type:

optuna.study.Study

objective(trial)

Objective function for optuna optimization that returns a score

Parameters:

trial (optuna.trial.Trial) – trial of optuna for optimization

Returns:

score of the current hyperparameter config

clean_up_after_exception(trial_number, trial_params, reason)

Clean up things after an exception: delete unfitted model if it exists and update runtime csv

Parameters:
  • trial_number (int) – number of the trial

  • trial_params (dict) – parameters of the trial

  • reason (str) – hint for the reason of the Exception

write_runtime_csv(dict_runtime)

Write runtime info to runtime csv file

Parameters:

dict_runtime (dict) – dictionary with runtime information

calc_runtime_stats()

Calculate runtime stats for saved csv file.

Returns:

dict with runtime info enhanced with runtime stats

Return type:

dict

check_params_for_duplicate(current_params)

Check if params were already suggested which might happen by design of TPE sampler.

Parameters:

current_params (dict) – dictionar with current parameters

Returns:

bool reflecting if current params were already used in the same study

Return type:

bool

pca_transform_train_test(train, test)

Deliver PCA transformed train and test set

Parameters:
  • train (pandas.DataFrame) – data for the training

  • test (pandas.DataFrame) – data for the testing

Returns:

tuple of transformed train and test dataset

Return type:

tuple

load_retrain_model(path, filename, retrain, early_stopping_point=None, test=None)

Load and retrain persisted model :param path: path where the model is saved :param filename: filename of the model :param retrain: data for retraining :param test: data for testing :param early_stopping_point: optional early stopping point relevant for some models :return: model instance

Parameters:
  • path (str) –

  • filename (str) –

  • retrain (pandas.DataFrame) –

  • early_stopping_point (int) –

  • test (pandas.DataFrame) –

Return type:

tuple

generate_results_on_test()

Generate the results on the testing data

Returns:

evaluation metrics dictionary

Return type:

dict

get_feature_importance(model, period)

Get feature importances for models that possess such a feature, e.g. XGBoost

Parameters:
Returns:

DataFrame with feature importance information

Return type:

pandas.DataFrame

plot_results(final_results)
Parameters:

final_results (pandas.DataFrame) –