calc.lib.cache.zarrmodels

Pydantic models for loading and saving data from zarr attributes (metadata)

These models help to read and write structured data about variants and samples This data is helpful when working with the genotype array (variants = row names, samples = column names).

Attributes

logger

Classes

ZarrSampleMetadata

ZarrVariantMetadata

A dataframe-y model, suitable for ingesting with pandas / polars / databases

Functions

is_valid_allele(...)

Module Contents

class calc.lib.cache.zarrmodels.ZarrSampleMetadata
root: list[str]
class calc.lib.cache.zarrmodels.ZarrVariantMetadata

A dataframe-y model, suitable for ingesting with pandas / polars / databases

Useful for saving / loading data about variants into a group-level zarr attribute.

Each target genome file will have its own variant metadata.

check_even_length() ZarrVariantMetadata
merge(other: ZarrVariantMetadata) ZarrVariantMetadata
to_df() polars.DataFrame
to_numpy() collections.abc.Mapping[str, numpy.typing.NDArray[Any]]

Convert the dataframe into 1D arrays

Handles converting strings to a consistent fixed width dtype

alts: Annotated[list[list[str] | None], AfterValidator(is_valid_allele)]
chr_name: list[str]
chr_pos: list[pydantic.PositiveInt]
ref: Annotated[list[str | None], AfterValidator(is_valid_allele)]
variant_id: list[str]
calc.lib.cache.zarrmodels.is_valid_allele(alleles: list[list[str] | None] | list[str | None]) list[list[str] | None] | list[str | None]
calc.lib.cache.zarrmodels.logger