pgsui.impute.supervised.imputers package

Submodules

pgsui.impute.supervised.imputers.hist_gradient_boosting module

pgsui.impute.supervised.imputers.hist_gradient_boosting.ensure_hgb_config(config: HGBConfig | Dict | str | None) HGBConfig[source]

Resolve HGB configuration from dataclass, mapping, or YAML path.

class pgsui.impute.supervised.imputers.hist_gradient_boosting.ImputeHistGradientBoosting(genotype_data: GenotypeData, *, config: HGBConfig | Dict | str | None = None, overrides: Dict | None = None)[source]

Bases: BaseImputer

Supervised HGB imputer driven by HGBConfig.

fit() BaseImputer[source]

Fit the imputer using self.genotype_data with no arguments.

This method prepares the imputer by splitting the data into training and testing sets, and masking all originally observed genotype entries in the test set to facilitate unbiased evaluation. It does not perform any actual imputation since the RefAllele imputer is deterministic.

Steps:
  1. Encode to 0/1/2 with -9/-1 as missing.

  2. Split samples into train/test.

  3. Train IterativeImputer on train (convert missing -> NaN).

  4. Evaluate on test non-missing positions (reconstruction metrics) and call your original plotting stack via _make_class_reports().

Returns:

self.

Return type:

BaseImputer

transform() ndarray[source]

Impute all samples and return imputed genotypes.

This method applies the trained imputer to the entire dataset, filling in missing genotype values. It ensures that any remaining missing values after imputation are set to -9, and decodes the imputed 0/1/2 genotypes back to their original format.

Returns:

(n_samples, n_loci) IUPAC strings (single-character codes).

Return type:

np.ndarray

Raises:

NotFittedError – If fit() has not been called prior to transform().

pgsui.impute.supervised.imputers.random_forest module

pgsui.impute.supervised.imputers.random_forest.ensure_rf_config(config: RFConfig | Dict | str | None) RFConfig[source]

Resolve RF configuration from dataclass, mapping, or YAML path.

class pgsui.impute.supervised.imputers.random_forest.ImputeRandomForest(genotype_data: GenotypeData, *, config: RFConfig | Dict | str | None = None, overrides: Dict | None = None)[source]

Bases: BaseImputer

Supervised RF imputer driven by RFConfig.

fit() BaseImputer[source]

Fit the imputer using self.genotype_data with no arguments.

This method trains the imputer on the provided genotype data.

Steps:
  1. Encode to 0/1/2 with -9/-1 as missing.

  2. Split samples into train/test.

  3. Train IterativeImputer on train (convert missing -> NaN).

  4. Evaluate on test non-missing positions (reconstruction metrics) and call your original plotting stack via _make_class_reports().

Returns:

self.

Return type:

BaseImputer

transform() ndarray[source]

Impute all samples and return imputed genotypes.

This method applies the trained imputer to the entire dataset, filling in missing genotype values. It ensures that any remaining missing values after imputation are set to -9, and decodes the imputed 0/1/2 genotypes back to their original format.

Returns:

(n_samples, n_loci) IUPAC strings (single-character codes).

Return type:

np.ndarray

Module contents