calc.lib.cache.targetvariants¶
Provides the TargetVariants class for representing hard-called genotypes and variants
This module defines a lightweight container of variant information, including:
chromosome name
chromosome position
reference allele
alternate allele
Genotypes are stored in numpy unsigned 8-bit integer arrays (values between 0 - 255). Valid genotype values include 0, 1, and a sentinel value to represent missing data.
The class exposes accessors for:
genotypes: a 3D numpy array of shape (n_variants, n_samples, ploidy)
samples: a list of sample identifiers
variant_df: a polars dataframe which contains variant metadata
The module depends on:
numpy for genotype matrix ops
polars for building variant metadata tables quickly
Attributes¶
Classes¶
Functions¶
|
Mutates lists in place! |
Module Contents¶
- class calc.lib.cache.targetvariants.TargetVariants(chr_name: list[str], pos: list[int], refs: list[str | None], alts: list[list[str] | None], gts: list[numpy.typing.NDArray[numpy.uint8]], samples: list[str], target_path: pgscatalog.calc.lib.types.Pathish, sampleset: str)¶
- write_zarr(zarr_group: zarr.Group) None¶
Write TargetVariants to a zarr group
Sample IDs, variant metadata, and a genotype array is written to the zarr group
The group must be at a file level in the hierarchy
- property genotypes: numpy.typing.NDArray[numpy.uint8]¶
- property samples: list[str]¶
- property variant_ids: list[str]¶
- property variant_metadata: calc.lib.cache.zarrmodels.ZarrVariantMetadata¶
Convert variant metadata to a dict
- calc.lib.cache.targetvariants.add_missing_positions_to_lists(*, chroms: list[str], positions: list[int], ref_alleles: list[str | None], alt_alleles: list[list[str] | None], hard_calls: list[numpy.typing.NDArray[numpy.uint8]], scoring_file_regions: list[tuple[str, int]], seen_positions: set[tuple[str, int]], n_samples: int) None¶
Mutates lists in place!
- calc.lib.cache.targetvariants.logger¶