pygscatalog¶
pygscatalog provides a set of Python CLI applications and developer libraries for working with polygenic scores (PGS),
including integration with the PGS Catalog.
These applications and libraries are used internally by the PGS Catalog Calculator, which is an automated workflow for calculating PGS, including adjustment of scores in the context of genetic ancestry similarity.
If you’re interested in PGS but aren’t sure where to begin, the calculator is the best place.
If you’re working with PGS data and want to do some kinds of bespoke analysis not supported by the calculator, these tools might be helpful.
Contents:
- How-to guides
- How to download scoring files from the PGS Catalog
- How to format scoring files from the PGS Catalog
- How to match scoring file variants against target genomes
- How to match variants across reference panels and target genomes
- How to aggregate PGS split across multiple files
- How to adjust PGS in the context of genetic ancestry
- How to validate PGS Catalog scoring files
- API Reference
Credits¶
pygscatalog (aka pgscatalog_utils) is developed as part of the PGS Catalog project, a
collaboration between the University of Cambridge’s Department of Public Health and Primary Care (Michael Inouye,
Samuel Lambert) and the European Bioinformatics Institute (Helen Parkinson, Laura Harris).
This package contains code libraries and apps for working with PGS Catalog data and calculating PGS within the
PGS Catalog Calculator (pgsc_calc) workflow, and is based on an earlier
codebase (pgscatalog_utils) with contributions and input from members
of the PGS Catalog team (Samuel Lambert, Benjamin Wingfield, Aoife McMahon Laurent Gil) and Inouye lab
(Rodrigo Canovas, Scott Ritchie, Jingqin Wu).
If you use this package in your analysis, please cite:
Lambert, Wingfield, et al. (2024) Enhancing the Polygenic Score Catalog with tools for score calculation and ancestry normalization. Nature Genetics. doi:10.1038/s41588-024-01937-x.
All of our code is open source and permissively licensed with Apache 2.
This work has received funding from EMBL-EBI core funds, the Baker Institute, the University of Cambridge, Health Data Research UK (HDRUK), and the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101016775 INTERVENE.