calc.lib.cache.targetvariants

Provides the TargetVariants class for representing hard-called genotypes and variants

This module defines a lightweight container of variant information, including:

  • chromosome name

  • chromosome position

  • reference allele

  • alternate allele

Genotypes are stored in numpy unsigned 8-bit integer arrays (values between 0 - 255). Valid genotype values include 0, 1, and a sentinel value to represent missing data.

The class exposes accessors for:

  • genotypes: a 3D numpy array of shape (n_variants, n_samples, ploidy)

  • samples: a list of sample identifiers

  • variant_df: a polars dataframe which contains variant metadata

The module depends on:

  • numpy for genotype matrix ops

  • polars for building variant metadata tables quickly

Attributes

logger

Classes

TargetVariants

Functions

add_missing_positions_to_lists(→ None)

Mutates lists in place!

Module Contents

class calc.lib.cache.targetvariants.TargetVariants(chr_name: list[str], pos: list[int], refs: list[str | None], alts: list[list[str] | None], gts: list[numpy.typing.NDArray[numpy.uint8]], samples: list[str], target_path: pgscatalog.calc.lib.types.Pathish, sampleset: str)
write_zarr(zarr_group: zarr.Group) None

Write TargetVariants to a zarr group

Sample IDs, variant metadata, and a genotype array is written to the zarr group

The group must be at a file level in the hierarchy

property genotypes: numpy.typing.NDArray[numpy.uint8]
property samples: list[str]
property variant_ids: list[str]
property variant_metadata: calc.lib.cache.zarrmodels.ZarrVariantMetadata

Convert variant metadata to a dict

calc.lib.cache.targetvariants.add_missing_positions_to_lists(*, chroms: list[str], positions: list[int], ref_alleles: list[str | None], alt_alleles: list[list[str] | None], hard_calls: list[numpy.typing.NDArray[numpy.uint8]], scoring_file_regions: list[tuple[str, int]], seen_positions: set[tuple[str, int]], n_samples: int) None

Mutates lists in place!

calc.lib.cache.targetvariants.logger