calc.lib.scorefile

Attributes

logger

Classes

Scorefiles

One or more scoring files processed with the pgscatalog-format program.

Functions

get_position_df(→ polars.DataFrame)

Get variants from the "meta" zarr group array

load_scoring_files(→ None)

Load scoring files into the score_variant_table

Module Contents

class calc.lib.scorefile.Scorefiles(paths: calc.lib.types.Pathish | calc.lib.types.PathishList)

One or more scoring files processed with the pgscatalog-format program.

get_unique_positions(chrom: str | None = None, zarr_group: zarr.Group | None = None) list[tuple[str, int]]
column_types
property paths: list[pathlib.Path]

Return the list of scoring file paths.

calc.lib.scorefile.get_position_df(zarr_group: zarr.Group) polars.DataFrame

Get variants from the “meta” zarr group array

calc.lib.scorefile.load_scoring_files(db_path: calc.lib.types.Pathish, scorefile_paths: calc.lib.types.PathishList, max_memory_gb: str, threads: int) None

Load scoring files into the score_variant_table

Parameters

db_pathPathish

Path to the DuckDB database file.

scorefile_pathsPathishList

A list of Pathish objects to the scoring CSV file(s). Scoring files must be in a structured format as created by pgscatalog-format.

max_memory_gbstr

Maximum memory DuckDB is allowed to use (e.g., “4GB”).

threadsint

Number of threads for DuckDB to use.

Notes

The score_variant_table is created or replaced each time this function is called.

Effect weights are stored as double precision floating-point numbers (np.float64 equivalent). All previous processing (e.g. by pgscatalog.core) treats effect weights as strings to prevent precision problems.

pgscatalog-format can process scoring files from the PGS Catalog or custom scoring files.

calc.lib.scorefile.logger