core.lib.catalogapi¶

Classes and functions related to the PGS Catalog API

Attributes¶

logger

Classes¶

`CatalogCategory`	The three main categories in the PGS Catalog
`CatalogQuery`	Efficiently query the PGS Catalog API using accessions
`ScoreQueryResult`	Class that holds score metadata with methods to extract important fields

Module Contents¶

class core.lib.catalogapi.CatalogCategory(*args, **kwds)¶

The three main categories in the PGS Catalog

Enumeration values don’t mean anything and are automatically generated:

>>> CatalogCategory.SCORE
<CatalogCategory.SCORE: 1>

PUBLICATION¶

SCORE¶

TRAIT¶

class core.lib.catalogapi.CatalogQuery(*, accession: str | list[str], include_children: bool | None = False, **kwargs: Any)¶

Efficiently query the PGS Catalog API using accessions

Supports trait (EFO), score (PGS ID), or publication identifier (PGP ID)

>>> CatalogQuery(accession="PGS000001")
CatalogQuery(accession='PGS000001', category=CatalogCategory.SCORE, include_children=None)

Supports multiple PGS ID input in a list:

>>> CatalogQuery(accession=["PGS000001", "PGS000002"])
CatalogQuery(accession=['PGS000001', 'PGS000002'], category=CatalogCategory.SCORE, include_children=None)

Duplicates are automatically dropped:

>>> CatalogQuery(accession=["PGS000001", "PGS000001"])
CatalogQuery(accession=['PGS000001'], category=CatalogCategory.SCORE, include_children=None)

Publications and trait accessions are supported too:

>>> CatalogQuery(accession="PGP000001")
CatalogQuery(accession='PGP000001', category=CatalogCategory.PUBLICATION, include_children=None)

>>> CatalogQuery(accession="EFO_0001645")
CatalogQuery(accession='EFO_0001645', category=CatalogCategory.TRAIT, include_children=False)

get_query_url() → list[str] | str¶

Automatically resolve a query URL for a PGS Catalog accession (or multiple score accessions).

A list is returned because when querying multiple score accessions batches are created:

>>> CatalogQuery(accession=["PGS000001","PGS000002"]).get_query_url()
['https://www.pgscatalog.org/rest/score/search?pgs_ids=PGS000001,PGS000002']

(each element in this list contains up to 50 score IDs)

Multiple score accessions are automatically deduplicated:

>>> CatalogQuery(accession = ["PGS000001"] * 100).get_query_url()
['https://www.pgscatalog.org/rest/score/search?pgs_ids=PGS000001']

Publications don’t batch because they natively support many scores:

>>> CatalogQuery(accession="PGP000001").get_query_url()
'https://www.pgscatalog.org/rest/publication/PGP000001'

Traits don’t batch for the same reason as publications:

>>> CatalogQuery(accession="EFO_0001645").get_query_url()
'https://www.pgscatalog.org/rest/trait/EFO_0001645?include_children=0'

Child traits terms aren’t included by default. Only traits can have children.

infer_category() → CatalogCategory¶

Inspect an accession and guess the Catalog category

>>> CatalogQuery(accession="PGS000001").infer_category()
<CatalogCategory.SCORE: 1>

>>> CatalogQuery(accession="EFO_0004346").infer_category()
<CatalogCategory.TRAIT: 2>

>>> CatalogQuery(accession="MONDO_0005041").infer_category()
<CatalogCategory.TRAIT: 2>

>>> CatalogQuery(accession="PGP000001").infer_category()
<CatalogCategory.PUBLICATION: 3>

Be careful, assume lists of accessions only contain PGS IDs:

>>> CatalogQuery(accession=["PGS000001", "PGS000002"]).infer_category()
<CatalogCategory.SCORE: 1>

score_query() → ScoreQueryResult | list[ScoreQueryResult]¶

Query the PGS Catalog API and return ScoreQueryResult

Information about a single score is returned as a dict:

>>> CatalogQuery(accession="PGS000001").score_query()
ScoreQueryResult(pgs_id='PGS000001', ftp_url=...

If information about multiple scores is found, it’s returned as a list:

>>> CatalogQuery(accession=["PGS000001", "PGS000002"]).score_query()
[ScoreQueryResult(pgs_id='PGS000001', ftp_url=...

Publications and traits always return a list of score information:

>>> CatalogQuery(accession="PGP000001").score_query()
[ScoreQueryResult(pgs_id='PGS000001', ftp_url=...

property accession: str | list[str]¶

category¶

class core.lib.catalogapi.ScoreQueryResult(*, pgs_id: str, ftp_url: str, ftp_grch37_url: str, ftp_grch38_url: str, license: str)¶

Class that holds score metadata with methods to extract important fields

classmethod from_query(result_response) → ScoreQueryResult | list[ScoreQueryResult]¶

Parses PGS Catalog API JSON response

Parameters:: result_response – PGS Catalog API JSON response
Returns:: ScoreQueryResult

>>> fake_response = {"id": "fake", "ftp_harmonized_scoring_files":
... {"GRCh37": {"positions": "fake.txt.gz"}, "GRCh38": {"positions": "fake.txt.gz"}},
... "license": "fake", "ftp_scoring_file": "fake.txt.gz"}
>>> ScoreQueryResult.from_query(fake_response)
ScoreQueryResult(pgs_id='fake', ftp_url='fake.txt.gz',...

get_download_url(genome_build: pgscatalog.core.lib.genomebuild.GenomeBuild | None = None) → str¶

Returns scoring file download URL, with support for specifying harmonised data in a specific genome build

>>> query = CatalogQuery(accession="PGS000001").score_query()
>>> build = GenomeBuild.GRCh38
>>> query.get_download_url()
'https://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS000001/ScoringFiles/PGS000001.txt.gz'
>>> query.get_download_url(build)
'https://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS000001/ScoringFiles/Harmonized/PGS000001_hmPOS_GRCh38.txt.gz'

ftp_grch37_url¶

ftp_grch38_url¶

ftp_url¶

license¶

pgs_id¶

core.lib.catalogapi.logger¶

core.lib.catalogapi¶

Attributes¶

Classes¶

Module Contents¶

pygscatalog

Navigation

Related Topics