pgsui.impute.supervised.imputers package

Submodules

pgsui.impute.supervised.imputers.hist_gradient_boosting module

pgsui.impute.supervised.imputers.hist_gradient_boosting.ensure_hgb_config(config: HGBConfig | Dict | str | None) → HGBConfig[source]: Resolve HGB configuration from dataclass, mapping, or YAML path.

class pgsui.impute.supervised.imputers.hist_gradient_boosting.ImputeHistGradientBoosting(genotype_data: GenotypeData, *, config: HGBConfig | Dict | str | None = None, overrides: Dict | None = None)[source]

Bases: BaseImputer

Supervised HGB imputer driven by HGBConfig.

fit() → BaseImputer[source]

Fit the imputer using self.genotype_data with no arguments.

This method prepares the imputer by splitting the data into training and testing sets, and masking all originally observed genotype entries in the test set to facilitate unbiased evaluation. It does not perform any actual imputation since the RefAllele imputer is deterministic.

Steps:

Encode to 0/1/2 with -9/-1 as missing.
Split samples into train/test.
Train IterativeImputer on train (convert missing -> NaN).
Evaluate on test non-missing positions (reconstruction metrics) and call your original plotting stack via _make_class_reports().

Returns:: self.
Return type:: BaseImputer

transform() → ndarray[source]

Impute all samples and return imputed genotypes.

This method applies the trained imputer to the entire dataset, filling in missing genotype values. It ensures that any remaining missing values after imputation are set to -9, and decodes the imputed 0/1/2 genotypes back to their original format.

Returns:: (n_samples, n_loci) IUPAC strings (single-character codes).
Return type:: np.ndarray
Raises:: NotFittedError – If fit() has not been called prior to transform().

pgsui.impute.supervised.imputers.random_forest module

pgsui.impute.supervised.imputers.random_forest.ensure_rf_config(config: RFConfig | Dict | str | None) → RFConfig[source]: Resolve RF configuration from dataclass, mapping, or YAML path.

class pgsui.impute.supervised.imputers.random_forest.ImputeRandomForest(genotype_data: GenotypeData, *, config: RFConfig | Dict | str | None = None, overrides: Dict | None = None)[source]