ForeTiS.optim_pipeline
Module Contents
Functions
|
Run the whole optimization pipeline |
- ForeTiS.optim_pipeline.run(data_dir, save_dir, datasplit='timeseries-cv', test_set_size_percentage=20, val_set_size_percentage=20, n_splits=3, imputation_method=None, windowsize_current_statistics=3, windowsize_lagged_statistics=3, models=None, n_trials=200, pca_transform=False, save_final_model=False, periodical_refit_frequency=None, refit_drops=0, data=None, config_file_path=None, config_file_section=None, refit_window=5, intermediate_results_interval=None, batch_size=32, n_epochs=100000, event_lags=None, optimize_featureset=False, scale_thr=0.1, scale_seasons=2, cf_thr_perc=70, scale_window_factor=0.1, cf_r=0.4, cf_order=1, cf_smooth=4, scale_window_minimum=2, max_samples_factor=10, valtest_seasons=1, seasonal_valtest=True)
Run the whole optimization pipeline
- Parameters:
data_dir (str) – data directory where the phenotype and genotype matrix are stored
save_dir (str) – directory for saving the results. Default is None, so same directory as data_dir
datasplit (str) – datasplit to use. Options are: nested-cv, cv-test, train-val-test
test_set_size_percentage (int) – size of the test set relevant for cv-test and train-val-test
val_set_size_percentage (int) – size of the validation set relevant for train-val-test
n_splits (int) – splits to use for ‘timeseries-cv’ or ‘cv’
imputation_method (str) – the imputation method to use. Options are: ‘mean’ , ‘knn’ , ‘iterative’
windowsize_current_statistics (int) – the windowsize for the feature engineering of the current statistic
windowsize_lagged_statistics (int) – the windowsize for the feature engineering of the lagged statistics
models (list) – list of models that should be optimized
n_trials (int) – number of trials for optuna
pca_transform (bool) – whether pca dimensionality reduction will be optimized or not
save_final_model (bool) – specify if the final model should be saved
periodical_refit_frequency (list) – if and for which intervals periodical refitting should be performed
refit_drops (int) – after how many periods the model should get updated
data (str) – the dataset that you want to use
config_file_path (str) – the path of the config file
config_file_section (str) – the section of the config file for the used dataset
refit_window (int) – seasons get used for refitting
intermediate_results_interval (int) – number of trials after which intermediate results will be saved
batch_size (int) – batch size for neural network models
n_epochs (int) – number of epochs for neural network models
event_lags (int) – the event lags for the counters
optimize_featureset (bool) – whether feature set will be optimized or not output scale threshold
scale_thr (float) – only relevant for evars-gpr: output scale threshold
scale_seasons (int) – only relevant for evars-gpr: output scale seasons taken into account
cf_thr_perc (int) – only relevant for evars-gpr: percentile of train set anomaly factors as threshold for cpd with changefinder
scale_window_factor (float) – only relevant for evars-gpr: scale window factor based on seasonal periods
cf_r (float) – only relevant for evars-gpr: changefinders r param (decay factor older values)
cf_order (int) – only relevant for evars-gpr: changefinders SDAR model order param
cf_smooth (int) – only relevant for evars-gpr: changefinders smoothing param
scale_window_minimum (int) – only relevant for evars-gpr: scale window minimum
max_samples_factor (int) – only relevant for evars-gpr: max samples factor of seasons to keep for gpr pipeline
valtest_seasons (int) – define the number of seasons to be used when seasonal_valtest is True
seasonal_valtest (bool) – whether validation and test sets should be a multiple of the season length