pgsui.impute.supervised.imputers package
Submodules
pgsui.impute.supervised.imputers.hist_gradient_boosting module
- pgsui.impute.supervised.imputers.hist_gradient_boosting.ensure_hgb_config(config: HGBConfig | Dict | str | None) HGBConfig[source]
Resolve HGB configuration from dataclass, mapping, or YAML path.
- class pgsui.impute.supervised.imputers.hist_gradient_boosting.ImputeHistGradientBoosting(genotype_data: GenotypeData, *, config: HGBConfig | Dict | str | None = None, overrides: Dict | None = None)[source]
Bases:
BaseImputerSupervised HGB imputer driven by
HGBConfig.- fit() BaseImputer[source]
Fit the imputer using self.genotype_data with no arguments.
This method prepares the imputer by splitting the data into training and testing sets, and masking all originally observed genotype entries in the test set to facilitate unbiased evaluation. It does not perform any actual imputation since the RefAllele imputer is deterministic.
- Steps:
Encode to 0/1/2 with -9/-1 as missing.
Split samples into train/test.
Train IterativeImputer on train (convert missing -> NaN).
Evaluate on test non-missing positions (reconstruction metrics) and call your original plotting stack via _make_class_reports().
- Returns:
self.
- Return type:
BaseImputer
- transform() ndarray[source]
Impute all samples and return imputed genotypes.
This method applies the trained imputer to the entire dataset, filling in missing genotype values. It ensures that any remaining missing values after imputation are set to -9, and decodes the imputed 0/1/2 genotypes back to their original format.
- Returns:
(n_samples, n_loci) IUPAC strings (single-character codes).
- Return type:
np.ndarray
- Raises:
NotFittedError – If fit() has not been called prior to transform().
pgsui.impute.supervised.imputers.random_forest module
- pgsui.impute.supervised.imputers.random_forest.ensure_rf_config(config: RFConfig | Dict | str | None) RFConfig[source]
Resolve RF configuration from dataclass, mapping, or YAML path.
- class pgsui.impute.supervised.imputers.random_forest.ImputeRandomForest(genotype_data: GenotypeData, *, config: RFConfig | Dict | str | None = None, overrides: Dict | None = None)[source]
Bases:
BaseImputerSupervised RF imputer driven by
RFConfig.- fit() BaseImputer[source]
Fit the imputer using self.genotype_data with no arguments.
This method trains the imputer on the provided genotype data.
- Steps:
Encode to 0/1/2 with -9/-1 as missing.
Split samples into train/test.
Train IterativeImputer on train (convert missing -> NaN).
Evaluate on test non-missing positions (reconstruction metrics) and call your original plotting stack via _make_class_reports().
- Returns:
self.
- Return type:
BaseImputer
- transform() ndarray[source]
Impute all samples and return imputed genotypes.
This method applies the trained imputer to the entire dataset, filling in missing genotype values. It ensures that any remaining missing values after imputation are set to -9, and decodes the imputed 0/1/2 genotypes back to their original format.
- Returns:
(n_samples, n_loci) IUPAC strings (single-character codes).
- Return type:
np.ndarray