Optuna Hyperparameter Tuning
Overview
PG-SUI automates hyperparameter optimization with Optuna. Tuning is available for the unsupervised neural network models (ImputeAutoencoder, ImputeVAE, ImputeUBP, ImputeNLPCA) and is also used by supervised tree models when enabled. Each model defines its own search space, while the Optuna framework handles sampling, pruning, and best-parameter persistence.
Workflow
Enable tuning via
tune.enabledor the CLI--tuneflag.PG-SUI creates an Optuna study using a TPE sampler and Hyperband-based pruning with patience to stop weak trials early.
Each trial samples a model-specific set of hyperparameters, trains the model, and evaluates a validation metric (or metrics) defined by
tune.metrics.The best trial parameters are persisted and then used for the final full-data fit.
For unsupervised models, tuning uses the simulated-missing evaluation mask so trials are scored against known truth.
Configuration controls
Key fields in pgsui.data_processing.containers.TuneConfig:
tune.enabled: turn tuning on or off.tune.metrics: metric name or a list of metrics for multi-objective tuning (for example,["f1", "mcc"]).tune.n_trials: number of Optuna trials to run.tune.epochs/tune.batch_size: optional per-trial training limits used by model-specific tuning loops (when supported).tune.patience: model-specific patience setting used during tuning (when supported).tune.save_db/tune.resume: persist and optionally resume an Optuna SQLite database.
Parallelism is controlled by io.n_jobs. Use --n-jobs on the CLI or set
io.n_jobs in YAML to increase Optuna worker count.
Artifacts and outputs
PG-SUI stores tuning artifacts under <prefix>_output/<Family>/optimize/:
parameters/best_tuned_parameters.json: Optuna-selected parameters.plots/: optional Optuna study visualizations.study_database/: SQLite databases whentune.save_dbis enabled.
After the final fit, the effective parameters are saved in
<prefix>_output/<Family>/parameters/best_parameters.json.
Examples
YAML configuration:
tune:
enabled: true
metrics: ["f1", "mcc"]
n_trials: 50
save_db: true
resume: false
CLI usage:
pg-sui \
--input cohort.vcf.gz \
--models ImputeNLPCA ImputeUBP \
--preset balanced \
--tune \
--tune-n-trials 50 \
--n-jobs 4
Notes
--load-best-paramsdisables tuning and uses the saved parameters from a previous run. This keeps runs reproducible and avoids conflicting overrides.