Config Reference

This page summarizes the configuration surface exposed through the containers.py dataclasses that back the CLI and Python APIs. Every block below lists defaults and the preset bundles applied by from_preset(...) helpers. Values are shown as declared in code; all options are ASCII-safe and can be overridden via YAML or dot-keys on the CLI.

Common Blocks (Unsupervised NN)

The four deep imputer families (Autoencoder, VAE, NLPCA, UBP) share these structures.

IOConfig

Field

Default

Description

prefix

"pgsui"

Run/output prefix used for directories and logging.

ploidy

2

Ploidy level (1 for haploid, 2 for diploid).

verbose / debug

False / False

Logging verbosity.

seed

None

RNG seed.

n_jobs

1

Parallel jobs for Optuna.

scoring_averaging

"macro"

Averaging mode for metrics (macro, micro, weighted).

ModelConfig

Field

Default

Description

latent_init

"random"

Latent init ("random" or "pca").

latent_dim

2

Latent width.

dropout_rate

0.2

Dropout applied to hidden layers.

num_hidden_layers

2

Count of hidden layers.

activation

"relu"

Hidden non-linearity (relu, elu, selu, leaky_relu).

layer_scaling_factor

5.0

Scales hidden widths.

layer_schedule

"pyramid"

Width layout ("pyramid", "linear").

TrainConfig

Field

Default

Description

batch_size

64

Mini-batch size.

learning_rate

1e-3

Base LR.

l1_penalty

0.0

L1 regularization.

early_stop_gen

25

Early-stop patience (epochs).

min_epochs / max_epochs

100 / 2000

Epoch bounds.

validation_split

0.2

Holdout fraction.

weights_max_ratio

None

Cap on class-weight ratio.

weights_power

1.0

Power scaling for class weights.

weights_normalize

True

Whether to normalize weights.

weights_inverse

False

Whether to invert weights.

gamma

0.0

Focal-loss gamma.

gamma_schedule

False

Whether to anneal gamma during training.

device

"cpu"

"cpu", "gpu", or "mps".

TuneConfig

Field

Default

Description

enabled

False

Turn on Optuna.

metrics

"f1"

Objective metric (or list of metrics) used for Optuna tuning (f1, accuracy, pr_macro, average_precision, roc_auc, precision, recall, mcc, jaccard).

n_trials

100

Number of trials.

resume / save_db

False / False

Reuse or persist Optuna DB.

epochs / batch_size

500 / 64

Per-trial training limits used by model-specific tuning loops (when supported).

patience

10

Model-specific patience setting used during tuning (when supported).

PlotConfig

Field

Default

Description

fmt

"pdf"

Output format.

dpi

300

Resolution.

fontsize

18

Base font size.

despine

True

Remove top/right spines.

show

True

Interactive display toggle.

SimConfig

Field

Default

Description

simulate_missing

False

Whether to simulate missingness for eval (required for unsupervised models).

sim_strategy

"random"

random, random_weighted, random_weighted_inv, nonrandom, nonrandom_weighted.

sim_prop

0.20

Proportion to mask.

sim_kwargs

None

Extra args forwarded to SimMissingTransformer.

Field-by-field notes

  • IOConfig - ploidy: Set to 1 for haploids; controls class count and decoding. - n_jobs: Controls Optuna parallelism.

  • ModelConfig - latent_dim: Governs compression strength; higher values retain more signal at the cost of capacity/overfit. - layer_schedule: pyramid shrinks toward the bottleneck; linear walks widths linearly.

  • TrainConfig - weights_power: Adjusts the aggression of class weighting (e.g., 0.5 for sqrt, 1.0 for standard inverse frequency). - gamma: Controls focal loss behavior. Use gamma_schedule=True to anneal it.

  • TuneConfig - metrics: Provide a string for single-objective tuning or a list/tuple for multi-objective tuning.

  • SimConfig - sim_strategy: nonrandom strategies require a tree parser.

Unsupervised NN presets

Each model exposes from_preset("fast" | "balanced" | "thorough") to seed a baseline, then allows overrides. The presets apply to Autoencoder, VAE, NLPCA, and UBP configs.

AutoencoderConfig

Preset baseline (all presets): - io: verbose=False, ploidy=2. - train: validation_split=0.20, weights_max_ratio=None, weights_power=1.0, weights_normalize=True, gamma=0.0. - model: activation="relu", layer_schedule="pyramid", layer_scaling_factor=2.0. - sim: simulate_missing=True, sim_strategy="random", sim_prop=0.2. - tune: enabled=False, n_trials=100.

Preset overrides:

  • fast - model: latent_dim=4, num_hidden_layers=1, dropout_rate=0.10. - train: batch_size=128, learning_rate=2e-3, early_stop_gen=15, max_epochs=200. - tune: patience=15.

  • balanced - model: latent_dim=8, num_hidden_layers=2, dropout_rate=0.20. - train: batch_size=64, learning_rate=1e-3, early_stop_gen=25, max_epochs=500. - tune: patience=25.

  • thorough - model: latent_dim=16, num_hidden_layers=3, dropout_rate=0.30. - train: batch_size=64, learning_rate=5e-4, early_stop_gen=50, max_epochs=1000. - tune: patience=50.

VAEConfig + VAEExtraConfig

Inherits structure from Autoencoder but adds VAEExtraConfig.

VAEExtraConfig

Field

Default

Description

kl_beta

1.0

Final KL weight.

kl_beta_schedule

False

Whether to anneal KL beta.

Preset overrides:

  • fast - model: latent_dim=4, num_hidden_layers=2, dropout_rate=0.10. - train: batch_size=128, learning_rate=2e-3, early_stop_gen=15, max_epochs=200. - vae: kl_beta=0.5. - tune: patience=15.

  • balanced - model: latent_dim=8, num_hidden_layers=4, dropout_rate=0.20. - train: batch_size=64, learning_rate=1e-3, early_stop_gen=25, max_epochs=500. - vae: kl_beta=1.0. - tune: patience=25.

  • thorough - model: latent_dim=16, num_hidden_layers=8, dropout_rate=0.30. - train: batch_size=64, learning_rate=5e-4, early_stop_gen=50, max_epochs=1000. - vae: kl_beta=1.0. - tune: patience=50.

NLPCAConfig + NLPCAExtraConfig

Inherits structure from Autoencoder and adds projection controls for latent refinement.

NLPCAExtraConfig

Field

Default

Description

projection_lr

0.05

Learning rate for projection refinement.

projection_epochs

100

Projection steps per evaluation or inference pass.

Preset overrides mirror Autoencoder presets; NLPCA adds only projection controls.

UBPConfig + UBPExtraConfig

Inherits structure from Autoencoder and adds projection controls for latent refinement.

UBPExtraConfig

Field

Default

Description

projection_lr

0.05

Learning rate for projection refinement.

projection_epochs

100

Projection steps per evaluation or inference pass.

Preset overrides mirror Autoencoder presets; UBP adds only projection controls.

Deterministic Imputers

These configurations use simpler blocks and do not use Neural Network specific settings like ModelConfig.

MostFrequentConfig / RefAlleleConfig

Common Fields: - split.test_size: Default 0.2. - sim.simulate_missing: Default False (enabled in presets). - algo.missing: Default -1.

MostFrequentAlgoConfig Extra Fields: - by_populations: False. - default: 0.

Supervised Wrappers (RF / HistGB)

Supervised models use distinct config classes ending in ConfigSupervised.

IOConfigSupervised

Field

Default

Description

prefix

"pgsui"

Run identity.

n_jobs

1

Parallel jobs.

seed

None

RNG seed.

TuningConfigSupervised

Field

Default

Description

enabled

True

Master toggle.

metric

"pr_macro"

Tuning metric.

n_trials

100

Trial count.

n_jobs

8

Parallel jobs for tuning.

fast

True

Whether to use faster settings (subsampling etc.).

RFModelConfig

Field

Default

Description

n_estimators

100

Forest size.

max_depth

None

Depth cap.

class_weight

"balanced"

Class weighting strategy.

HGBModelConfig

Field

Default

Description

n_estimators

100

Boosting iterations.

learning_rate

0.1

Step size.

n_iter_no_change

10

Early stopping patience.

Presets (Supervised)

RandomForest (RFConfig):

  • fast: n_estimators=50, max_iter=5, tune.enabled=False.

  • balanced: n_estimators=200, max_iter=10, tune.enabled=False.

  • thorough: n_estimators=500, max_depth=50, max_iter=20, tune.enabled=False.

HistGradientBoosting (HGBConfig):

  • fast: n_estimators=50, learning_rate=0.2, max_iter=5.

  • balanced: n_estimators=150, learning_rate=0.1, max_iter=10.

  • thorough: n_estimators=500, learning_rate=0.05, max_iter=20, n_iter_no_change=20.