core.lib.catalogapi¶
Classes and functions related to the PGS Catalog API
Attributes¶
Classes¶
The three main categories in the PGS Catalog |
|
Efficiently query the PGS Catalog API using accessions |
|
Class that holds score metadata with methods to extract important fields |
Module Contents¶
- class core.lib.catalogapi.CatalogCategory(*args, **kwds)¶
The three main categories in the PGS Catalog
Enumeration values don’t mean anything and are automatically generated:
>>> CatalogCategory.SCORE <CatalogCategory.SCORE: 1>
- PUBLICATION¶
- SCORE¶
- TRAIT¶
- class core.lib.catalogapi.CatalogQuery(*, accession: str | list[str], include_children: bool | None = False, **kwargs: Any)¶
Efficiently query the PGS Catalog API using accessions
Supports trait (EFO), score (PGS ID), or publication identifier (PGP ID)
>>> CatalogQuery(accession="PGS000001") CatalogQuery(accession='PGS000001', category=CatalogCategory.SCORE, include_children=None)
Supports multiple PGS ID input in a list:
>>> CatalogQuery(accession=["PGS000001", "PGS000002"]) CatalogQuery(accession=['PGS000001', 'PGS000002'], category=CatalogCategory.SCORE, include_children=None)
Duplicates are automatically dropped:
>>> CatalogQuery(accession=["PGS000001", "PGS000001"]) CatalogQuery(accession=['PGS000001'], category=CatalogCategory.SCORE, include_children=None)
Publications and trait accessions are supported too:
>>> CatalogQuery(accession="PGP000001") CatalogQuery(accession='PGP000001', category=CatalogCategory.PUBLICATION, include_children=None)
>>> CatalogQuery(accession="EFO_0001645") CatalogQuery(accession='EFO_0001645', category=CatalogCategory.TRAIT, include_children=False)
- get_query_url() list[str] | str¶
Automatically resolve a query URL for a PGS Catalog accession (or multiple score accessions).
A list is returned because when querying multiple score accessions batches are created:
>>> CatalogQuery(accession=["PGS000001","PGS000002"]).get_query_url() ['https://www.pgscatalog.org/rest/score/search?pgs_ids=PGS000001,PGS000002']
(each element in this list contains up to 50 score IDs)
Multiple score accessions are automatically deduplicated:
>>> CatalogQuery(accession = ["PGS000001"] * 100).get_query_url() ['https://www.pgscatalog.org/rest/score/search?pgs_ids=PGS000001']
Publications don’t batch because they natively support many scores:
>>> CatalogQuery(accession="PGP000001").get_query_url() 'https://www.pgscatalog.org/rest/publication/PGP000001'
Traits don’t batch for the same reason as publications:
>>> CatalogQuery(accession="EFO_0001645").get_query_url() 'https://www.pgscatalog.org/rest/trait/EFO_0001645?include_children=0'
Child traits terms aren’t included by default. Only traits can have children.
- infer_category() CatalogCategory¶
Inspect an accession and guess the Catalog category
>>> CatalogQuery(accession="PGS000001").infer_category() <CatalogCategory.SCORE: 1>
>>> CatalogQuery(accession="EFO_0004346").infer_category() <CatalogCategory.TRAIT: 2>
>>> CatalogQuery(accession="MONDO_0005041").infer_category() <CatalogCategory.TRAIT: 2>
>>> CatalogQuery(accession="PGP000001").infer_category() <CatalogCategory.PUBLICATION: 3>
Be careful, assume lists of accessions only contain PGS IDs:
>>> CatalogQuery(accession=["PGS000001", "PGS000002"]).infer_category() <CatalogCategory.SCORE: 1>
- score_query() ScoreQueryResult | list[ScoreQueryResult]¶
Query the PGS Catalog API and return
ScoreQueryResultInformation about a single score is returned as a dict:
>>> CatalogQuery(accession="PGS000001").score_query() ScoreQueryResult(pgs_id='PGS000001', ftp_url=...
If information about multiple scores is found, it’s returned as a list:
>>> CatalogQuery(accession=["PGS000001", "PGS000002"]).score_query() [ScoreQueryResult(pgs_id='PGS000001', ftp_url=...
Publications and traits always return a list of score information:
>>> CatalogQuery(accession="PGP000001").score_query() [ScoreQueryResult(pgs_id='PGS000001', ftp_url=...
- property accession: str | list[str]¶
- category¶
- class core.lib.catalogapi.ScoreQueryResult(*, pgs_id: str, ftp_url: str, ftp_grch37_url: str, ftp_grch38_url: str, license: str)¶
Class that holds score metadata with methods to extract important fields
- classmethod from_query(result_response) ScoreQueryResult | list[ScoreQueryResult]¶
Parses PGS Catalog API JSON response
- Parameters:
result_response – PGS Catalog API JSON response
- Returns:
>>> fake_response = {"id": "fake", "ftp_harmonized_scoring_files": ... {"GRCh37": {"positions": "fake.txt.gz"}, "GRCh38": {"positions": "fake.txt.gz"}}, ... "license": "fake", "ftp_scoring_file": "fake.txt.gz"} >>> ScoreQueryResult.from_query(fake_response) ScoreQueryResult(pgs_id='fake', ftp_url='fake.txt.gz',...
- get_download_url(genome_build: pgscatalog.core.lib.genomebuild.GenomeBuild | None = None) str¶
Returns scoring file download URL, with support for specifying harmonised data in a specific genome build
>>> query = CatalogQuery(accession="PGS000001").score_query() >>> build = GenomeBuild.GRCh38 >>> query.get_download_url() 'https://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS000001/ScoringFiles/PGS000001.txt.gz' >>> query.get_download_url(build) 'https://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS000001/ScoringFiles/Harmonized/PGS000001_hmPOS_GRCh38.txt.gz'
- ftp_grch37_url¶
- ftp_grch38_url¶
- ftp_url¶
- license¶
- pgs_id¶
- core.lib.catalogapi.logger¶