core.lib.models =============== .. py:module:: core.lib.models .. autoapi-nested-parse:: PGS Catalog pydantic models for data validation Best way to reuse: * `from pgscatalog.core import models` and use `models.CatalogScoreVariant(**d)` * `import pgscatalog.core` and use fully qualified name: `pgscatalog.core.models.CatalogScoreVariant`) Classes ------- .. autoapisummary:: core.lib.models.Allele core.lib.models.CatalogScoreHeader core.lib.models.CatalogScoreVariant core.lib.models.ScoreFormatVersion core.lib.models.ScoreHeader core.lib.models.ScoreLog core.lib.models.ScoreLogs core.lib.models.ScoreVariant core.lib.models.VariantLog core.lib.models.VariantType Module Contents --------------- .. py:class:: Allele A class that represents an allele found in PGS Catalog scoring files >>> simple_ea = Allele(**{"allele": "A"}) >>> simple_ea Allele(allele='A', is_snp=True) >>> str(simple_ea) 'A' >>> Allele(**{"allele": "AG"}) Allele(allele='AG', is_snp=True) >>> hla_example = Allele(**{"allele": "+"}) >>> hla_example Allele(allele='+', is_snp=False) >>> Allele(allele="A") Allele(allele='A', is_snp=True) >>> Allele(allele="A/T").has_multiple_alleles True .. py:method:: serialize() -> str When dumping the model, flatten it to just return the allele as a string .. py:attribute:: allele :type: str .. py:property:: has_multiple_alleles :type: bool .. py:property:: is_snp :type: bool SNPs are the most common type of effect allele in PGS Catalog scoring files. More complex effect alleles, like HLAs or APOE genes, often require extra work to represent in genomes. Users should be warned about complex effect alleles. .. py:class:: CatalogScoreHeader A ScoreHeader that validates the PGS Catalog Scoring File header standard https://www.pgscatalog.org/downloads/#dl_ftp_scoring >>> from ._config import Config >>> testpath = Config.ROOT_DIR / "tests" / "data" / "PGS000001_hmPOS_GRCh38.txt.gz" >>> test = CatalogScoreHeader.from_path(testpath) # doctest: +ELLIPSIS >>> test # doctest: +ELLIPSIS CatalogScoreHeader(pgs_id='PGS000001', pgs_name='PRS77_BC', trait_reported='Breast cancer', genome_build=None, format_version=, trait_mapped=['breast carcinoma'], trait_efo=['EFO_0000305'], variants_number=77, weight_type=None, pgp_id='PGP000001', citation='Mavaddat N et al. J Natl Cancer Inst (2015). doi:10.1093/jnci/djv036', HmPOS_build=GenomeBuild.GRCh38, HmPOS_date=datetime.date(2022, 7, 29), HmPOS_match_pos='{"True": null, "False": null}', HmPOS_match_chr='{"True": null, "False": null}') >>> test.variants_number == test.row_count True .. py:method:: check_format_version(version: ScoreFormatVersion) -> ScoreFormatVersion :classmethod: .. py:method:: check_pgp_id(pgp_id: str) -> str :classmethod: .. py:method:: check_pgs_id(pgs_id: str) -> str :classmethod: .. py:method:: parse_genome_build(value: str) -> pgscatalog.core.lib.genomebuild.GenomeBuild | None :classmethod: .. py:method:: parse_weight_type(value: str | None) -> str | None :classmethod: .. py:method:: serialize_genomebuild(genome_build: pgscatalog.core.lib.genomebuild.GenomeBuild | None, _info: pydantic.SerializationInfo) -> str .. py:method:: split_traits(trait: str) -> list[str] :classmethod: .. py:attribute:: HmPOS_build :type: Annotated[pgscatalog.core.lib.genomebuild.GenomeBuild | None, Field(default=None)] .. py:attribute:: HmPOS_date :type: Annotated[datetime.date | None, Field(default=None)] .. py:attribute:: HmPOS_match_chr :type: Annotated[str | None, Field(default=None)] .. py:attribute:: HmPOS_match_pos :type: Annotated[str | None, Field(default=None)] .. py:attribute:: citation :type: str .. py:attribute:: format_version :type: ScoreFormatVersion .. py:property:: is_harmonised :type: bool .. py:attribute:: license :type: Annotated[str | None, Field('PGS obtained from the Catalog should be cited appropriately, and used in accordance with any licensing restrictions set by the authors. See EBI Terms of Use (https://www.ebi.ac.uk/about/terms-of-use/) for additional details.', repr=False)] .. py:attribute:: pgp_id :type: str .. py:attribute:: trait_efo :type: Annotated[list[str], Field(description="Ontology trait name, e.g. 'breast carcinoma")] .. py:attribute:: trait_mapped :type: Annotated[list[str], Field(description='Trait name')] .. py:attribute:: variants_number :type: Annotated[int, Field(gt=0, description='Number of variants listed in the PGS', default=None)] .. py:attribute:: weight_type :type: Annotated[str | None, Field(description='Variant weight type', default=None)] .. py:class:: CatalogScoreVariant A model representing a row from a PGS Catalog scoring file, defined here: https://www.pgscatalog.org/downloads/#scoring_columns Implementation notes: - You should instantiate effect weight fields with strings (e.g. with csv.reader, which returns data as a list of strings) - The model always handles effect weights internally as strings and will coerce numeric input to strings when instantiated - Our string obsession comes from a desire to faithfully reproduce author submitted data and avoid introducing precision errors Extra / dynamically named fields: Only one type of dynamic field is supported. Ancestry specific allele frequency information uses labels defined by authors. An example from the first row from PGS000662: >>> variant_with_allelefrequency = {"chr_name": "1", "chr_position": 5743196, "effect_allele": "T", "other_allele": "C", "effect_weight": 0.102298257, "allelefrequency_effect_European": 0.067, "allelefrequency_effect_African": 0.439, "allelefrequency_effect_Asian": 0.113, "allelefrequency_effect_Hispanic": 0.157} >>> CatalogScoreVariant(**variant_with_allelefrequency) # doctest: +ELLIPSIS CatalogScoreVariant(rsID=None, chr_name='1', chr_position=5743196..., allelefrequency_effect_European=0.067, allelefrequency_effect_African=0.439, allelefrequency_effect_Asian=0.113, allelefrequency_effect_Hispanic=0.157, ...) An example from the first row from PGS000018 with the edited column name 'rsid': >>> variant_with_rsid_column = {"rsid": "rs2843152", "chr_name": 1, "chr_position": 2245570, "effect_allele": "G", "other_allele": "C", "effect_weight": -2.76009e-02} >>> CatalogScoreVariant(**variant_with_rsid_column) CatalogScoreVariant(rsID='rs2843152', chr_name='1', chr_position=2245570..., effect_weight='-0.0276009', ...) Extra field names which don't follow the pattern "allelefrequency_effect_{label}" will raise a ValueError: >>> bad_extra_fields = variant_with_allelefrequency | {"favourite_ice_cream": "vanilla"} >>> CatalogScoreVariant(**bad_extra_fields) Traceback (most recent call last): ... pydantic_core._pydantic_core.ValidationError: 1 validation error for CatalogScoreVariant Value error, Invalid extra fields detected: ['favourite_ice_cream'] ... Complex alleles are represented a little differently: >>> complex_allele = {"chr_name": 19, "effect_allele": "APOE_e2", "effect_weight": -0.5, "locus_name": "APOE", "is_haplotype": True, "variant_type": "APOE_allele", "variant_description": None} >>> CatalogScoreVariant(**complex_allele) CatalogScoreVariant(rsID=None, chr_name='19', chr_position=None, effect_allele=Allele(allele='APOE_e2', is_snp=False), other_allele=None, locus_name='APOE', is_haplotype=True, is_diplotype=False, imputation_method=None, variant_description=None, inclusion_criteria=None, effect_weight='-0.5', is_interaction=False, is_dominant=False, is_recessive=False, dosage_0_weight=None, dosage_1_weight=None, dosage_2_weight=None, OR=None, HR=None, allelefrequency_effect=None, hm_source=None, hm_rsID=None, hm_chr=None, hm_pos=None, hm_inferOtherAllele=None, hm_match_chr=None, hm_match_pos=None, variant_type=, variant_id='19::APOE_e2:', is_harmonised=False, is_complex=True, is_non_additive=False, effect_type=EffectType.ADDITIVE) Although effect weights are typed as optional, if all effect weight fields are missing then a model validator will raise a validation error: >>> CatalogScoreVariant(**{"chr_name": "19", "chr_position": 1, "effect_allele": "A", "effect_weight": None}) # doctest: +ELLIPSIS Traceback (most recent call last): ... pydantic_core._pydantic_core.ValidationError: 1 validation error for CatalogScoreVariant Value error, All effect weight fields are missing ... effect_weight can be missing if dosage_n_weight (non-additive) fields are all present However, dosage_n_weight fields must _all_ be present, if they're present: >>> CatalogScoreVariant(**{"chr_name": "19", "chr_position": 1, "effect_allele": "A", "dosage_0_weight": 0.1, "dosage_1_weight": None, "dosage_2_weight": None}) # doctest: +ELLIPSIS Traceback (most recent call last): ... pydantic_core._pydantic_core.ValidationError: 1 validation error for CatalogScoreVariant Value error, Dosage missing effect weight ... A variant may have all effect weight fields. During normalisation the standard effect_weight column will be used: >>> CatalogScoreVariant(**{"chr_name": "19", "chr_position": 1, "effect_allele": "A", "effect_weight": 0.05, "dosage_0_weight": 0, "dosage_1_weight": 0.1, "dosage_2_weight": 0.3}) CatalogScoreVariant(rsID=None, chr_name='19', ..., is_non_additive=False, ... Note that is_non_additive is false if effect_weight column exists, although non-additive fields do exist. .. py:method:: alleles_must_parse(value: Any) -> Allele :classmethod: .. py:method:: check_complex_variants() -> CatalogScoreVariant .. py:method:: check_effect_weights() -> CatalogScoreVariant .. py:method:: check_extra_fields() -> CatalogScoreVariant Only allelefrequency_effect_{ancestry} is supported as an extra field {ancestry} is dynamic and set by submitters .. py:method:: check_position() -> CatalogScoreVariant .. py:method:: effect_weight_must_float(weight: str | None) -> str | None :classmethod: .. py:method:: empty_string_to_none(v: Any) -> Any | None :classmethod: .. py:method:: set_missing_rsid(rsid: str | None) -> str | None :classmethod: .. py:attribute:: HR :type: Annotated[float | None, Field(default=None, title='Hazard Ratio', description='Author-reported effect sizes can be supplied to the Catalog. If no other effect_weight is given the weight is calculated using the log(OR) or log(HR).')] .. py:attribute:: OR :type: Annotated[float | None, Field(default=None, title='Odds Ratio', description='Author-reported effect sizes can be supplied to the Catalog. If no other effect_weight is given the weight is calculated using the log(OR) or log(HR).')] .. py:attribute:: allelefrequency_effect :type: Annotated[float | None, Field(default=None, title='Effect Allele Frequency', description='Reported effect allele frequency, if the associated locus is a haplotype then haplotype frequency will be extracted.', ge=0)] .. py:attribute:: chr_name :type: Annotated[str | None, Field(default=None, title='Location - Chromosome ', description='Chromosome name/number associated with the variant.', coerce_numbers_to_str=True)] .. py:attribute:: chr_position :type: Annotated[int | None, Field(default=None, title='Location within the Chromosome', description='Chromosomal position associated with the variant.', gt=0)] .. py:attribute:: complex_columns :type: ClassVar[tuple[str, str, str]] :value: ('is_haplotype', 'is_diplotype', 'is_interaction') .. py:attribute:: dosage_0_weight :type: Annotated[str | None, Field(default=None, title='Effect weight with 0 copy of the effect allele', description='Weights that are specific to different dosages of the effect_allele (e.g. {0, 1, 2} copies) can also be reported when the the contribution of the variants to the score is not encoded as additive, dominant, or recessive. In this case three columns are added corresponding to which variant weight should be applied for each dosage, where the column name is formated as dosage_#_weight where the # sign indicates the number of effect_allele copies.', coerce_numbers_to_str=True)] .. py:attribute:: dosage_1_weight :type: Annotated[str | None, Field(default=None, title='Effect weight with 1 copy of the effect allele', description='Weights that are specific to different dosages of the effect_allele (e.g. {0, 1, 2} copies) can also be reported when the the contribution of the variants to the score is not encoded as additive, dominant, or recessive. In this case three columns are added corresponding to which variant weight should be applied for each dosage, where the column name is formated as dosage_#_weight where the # sign indicates the number of effect_allele copies.', coerce_numbers_to_str=True)] .. py:attribute:: dosage_2_weight :type: Annotated[str | None, Field(default=None, title='Effect weight with 2 copies of the effect allele', description='Weights that are specific to different dosages of the effect_allele (e.g. {0, 1, 2} copies) can also be reported when the the contribution of the variants to the score is not encoded as additive, dominant, or recessive. In this case three columns are added corresponding to which variant weight should be applied for each dosage, where the column name is formated as dosage_#_weight where the # sign indicates the number of effect_allele copies.', coerce_numbers_to_str=True)] .. py:attribute:: effect_allele :type: Annotated[Allele | None, Field(default=None, title='Effect Allele', description="The allele that's dosage is counted (e.g. {0, 1, 2}) and multiplied by the variant's weight (effect_weight) when calculating score. The effect allele is also known as the 'risk allele'. Note: this does not necessarily need to correspond to the minor allele/alternative allele.")] .. py:property:: effect_type :type: pgscatalog.core.lib.effecttype.EffectType .. py:attribute:: effect_weight :type: Annotated[str | None, Field(default=None, title='Variant Weight', description='Value of the effect that is multiplied by the dosage of the effect allele (effect_allele) when calculating the score. Additional information on how the effect_weight was derived is in the weight_type field of the header, and score development method in the metadata downloads.', coerce_numbers_to_str=True)] .. py:attribute:: harmonised_columns :type: ClassVar[tuple[str, str, str, str]] :value: ('hm_source', 'hm_rsID', 'hm_chr', 'hm_pos') .. py:attribute:: hm_chr :type: Annotated[str | None, Field(default=None, title='Harmonized chromosome name', description='Chromosome that the harmonized variant is present on, preferring matches to chromosomes over patches present in later builds.')] .. py:attribute:: hm_inferOtherAllele :type: Annotated[Allele | None, Field(default=None, title='Harmonized other alleles', description='If only the effect_allele is given we attempt to infer the non-effect/other allele(s) using Ensembl/dbSNP alleles.')] .. py:attribute:: hm_match_chr :type: Annotated[bool | None, Field(default=None, title='FLAG: matching chromosome name', description='Used for QC. Only provided if the scoring file is being harmonized to the same genome build, and where the chromosome name is provided in the column chr_name.')] .. py:attribute:: hm_match_pos :type: Annotated[bool | None, Field(default=None, title='FLAG: matching chromosome position', description='Used for QC. Only provided if the scoring file is being harmonized to the same genome build, and where the chromosome name is provided in the column chr_position.')] .. py:attribute:: hm_pos :type: Annotated[int | None, Field(ge=0, default=None, title='Harmonized chromosome position', description='Chromosomal position (base pair location) where the variant is located, preferring matches to chromosomes over patches present in later builds.')] .. py:attribute:: hm_rsID :type: Annotated[str | None, Field(default=None, title='Harmonized rsID', description='Current rsID. Differences between this column and the author-reported column (rsID) indicate variant merges and annotation updates from dbSNP.')] .. py:attribute:: hm_source :type: Annotated[str | None, Field(default=None, title='Provider of the harmonized variant information', description='Data source of the variant position. Options include: ENSEMBL, liftover, author-reported (if being harmonized to the same build).')] .. py:attribute:: imputation_method :type: Annotated[str | None, Field(default=None, title='Imputation Method', description='This described whether the variant was specifically called with a specific imputation or variant calling method. This is mostly kept to describe HLA-genotyping methods (e.g. flag SNP2HLA, HLA*IMP) that gives alleles that are not referenced by genomic position.')] .. py:attribute:: inclusion_criteria :type: Annotated[str | None, Field(default=None, title='Score Inclusion Criteria', description='Explanation of when this variant gets included into the PGS (e.g. if it depends on the results from other variants).')] .. py:property:: is_complex :type: bool .. py:attribute:: is_diplotype :type: Annotated[bool | None, Field(default=False, title='FLAG: Diplotype', description='This is a TRUE/FALSE variable that flags whether the effect allele is a haplotype/diplotype rather than a single SNP. Constituent SNPs in the haplotype are semi-colon separated.')] .. py:attribute:: is_dominant :type: Annotated[bool | None, Field(default=False, title='FLAG: Dominant Inheritance Model', description='This is a TRUE/FALSE variable that flags whether the weight should be added to the PGS sum if there is at least 1 copy of the effect allele (e.g. it is a dominant allele).')] .. py:attribute:: is_haplotype :type: Annotated[bool | None, Field(default=False, title='FLAG: Haplotype', description='This is a TRUE/FALSE variable that flags whether the effect allele is a haplotype/diplotype rather than a single SNP. Constituent SNPs in the haplotype are semi-colon separated.')] .. py:property:: is_harmonised :type: bool .. py:property:: is_hm_bad :type: bool Was harmonisation OK? .. py:attribute:: is_interaction :type: Annotated[bool | None, Field(default=False, title='FLAG: Interaction', description='This is a TRUE/FALSE variable that flags whether the weight should be multiplied with the dosage of more than one variant. Interactions are demarcated with a _x_ between entries for each of the variants present in the interaction.')] .. py:property:: is_non_additive :type: bool .. py:attribute:: is_recessive :type: Annotated[bool | None, Field(default=False, title='FLAG: Recessive Inheritance Model', description='This is a TRUE/FALSE variable that flags whether the weight should be added to the PGS sum only if there are 2 copies of the effect allele (e.g. it is a recessive allele).')] .. py:attribute:: locus_name :type: Annotated[str | None, Field(default=None, title='Locus Name', description='This is kept in for loci where the variant may be referenced by the gene (APOE e4). It is also common (usually in smaller PGS) to see the variants named according to the genes they impact.')] .. py:attribute:: model_config .. py:attribute:: non_additive_columns :type: ClassVar[tuple[str, str, str]] :value: ('dosage_0_weight', 'dosage_1_weight', 'dosage_2_weight') .. py:attribute:: other_allele :type: Annotated[Allele | None, Field(default=None, title='Other allele(s)', description='The other allele(s) at the loci. Note: this does not necessarily need to correspond to the reference allele.')] .. py:attribute:: rsID :type: Annotated[str | None, Field(default=None, validation_alias=AliasChoices('rsID', 'rsid'), title='dbSNP Accession ID (rsID)', description='The SNP’s rsID. This column also contains HLA alleles in the standard notation (e.g. HLA-DQA1*0102) that aren’t always provided with chromosomal positions.')] .. py:attribute:: variant_description :type: Annotated[str | None, Field(default=None, title='Variant Description', description='This field describes any extra information about the variant (e.g. how it is genotyped or scored) that cannot be captured by the other fields.')] .. py:property:: variant_id :type: str ID = chr:pos:effect_allele:other_allele .. py:attribute:: variant_type :type: Annotated[VariantType | None, Field(default=None, title='Complex alleles only: how is the variant name formatted?')] .. py:class:: ScoreFormatVersion See https://www.pgscatalog.org/downloads/#scoring_changes v1 was deprecated in December 2021 .. py:attribute:: v2 :value: '2.0' .. py:class:: ScoreHeader Headers store useful metadata about a scoring file. Data validation is less strict than the CatalogScoreHeader, to make it easier for people to use custom scoring files with the PGS Catalog Calculator. >>> ScoreHeader(**{"pgs_id": "PGS123456", "trait_reported": "testtrait", "genome_build": "GRCh38"}) ScoreHeader(pgs_id='PGS123456', pgs_name=None, trait_reported='testtrait', genome_build=GenomeBuild.GRCh38) >>> ScoreHeader(**{"omicspred_id": "OPGS123456", "trait_reported": "testtrait", "genome_build": "GRCh38"}) ScoreHeader(pgs_id='OPGS123456', pgs_name=None, trait_reported='testtrait', genome_build=GenomeBuild.GRCh38) >>> ScoreHeader(**{"score_id": "SC1234B", "trait_reported": "testtrait", "genome_build": "GRCh37"}) ScoreHeader(pgs_id='SC1234B', pgs_name=None, trait_reported='testtrait', genome_build=GenomeBuild.GRCh37) >>> from ._config import Config >>> testpath = Config.ROOT_DIR / "tests" / "data" / "PGS000001_hmPOS_GRCh38.txt.gz" >>> ScoreHeader.from_path(testpath).row_count 77 >>> from ._config import Config >>> testpath = Config.ROOT_DIR / "tests" / "data" / "OPGS002493.txt.gz" >>> test = ScoreHeader.from_path(testpath) # doctest .. py:method:: from_path(path: str | pathlib.Path) -> Self :classmethod: .. py:method:: parse_genome_build(value: str) -> pgscatalog.core.lib.genomebuild.GenomeBuild | None :classmethod: .. py:method:: serialize_genomebuild(genome_build: pgscatalog.core.lib.genomebuild.GenomeBuild, _info: pydantic.SerializationInfo) -> str .. py:attribute:: genome_build :type: Annotated[pgscatalog.core.lib.genomebuild.GenomeBuild | None, Field(description='Genome build')] .. py:property:: is_harmonised :type: bool .. py:attribute:: pgs_id :type: Annotated[str | None, Field(title='PGS identifier', validation_alias=AliasChoices('pgs_id', 'omicspred_id', 'score_id'))] .. py:attribute:: pgs_name :type: Annotated[str | None, Field(description='PGS name', default=None)] .. py:property:: row_count :type: int Calculate the number of variants in the scoring file by counting the number of rows .. py:attribute:: trait_reported :type: Annotated[str, Field(description='Trait name')] .. py:class:: ScoreLog A log that includes header information and variant summary statistics >>> header = CatalogScoreHeader(pgs_id='PGS000001', pgs_name='PRS77_BC', trait_reported='Breast cancer', genome_build=None, format_version=ScoreFormatVersion.v2, trait_mapped='breast carcinoma', trait_efo='EFO_0000305', variants_number=77, weight_type="NR", pgp_id='PGP000001', citation='Mavaddat N et al. J Natl Cancer Inst (2015). doi:10.1093/jnci/djv036', HmPOS_build="GRCh38", HmPOS_date="2022-07-29") >>> harmonised_variant = ScoreVariant(**{"rsID": None, "chr_name": "1", "chr_position": 1, "effect_allele": "HLA-DQ", "effect_weight": 0.5, "hm_chr": "1", "hm_pos": 1, "hm_rsID": "rs1921", "hm_source": "ENSEMBL", "row_nr": 0, "accession": "test"}) >>> variant_log = harmonised_variant.model_dump(include={"hm_source", "is_complex"}) >>> scorelog = ScoreLog(header=header, compatible_effect_type=True, variant_logs=[VariantLog(**variant_log)]) # doctest: +ELLIPSIS >>> scorelog ScoreLog(header=CatalogScoreHeader(...), compatible_effect_type=True, has_complex_alleles=True, pgs_id='PGS000001', is_harmonised=True, sources=['ENSEMBL']) In the original scoring file header there were 77 variants: >>> scorelog.header.variants_number 77 But we've only got 1 ScoreVariant: >>> scorelog.n_actual_variants 1 >>> scorelog.variant_count_difference 76 >>> scorelog.variants_are_missing True Maybe they were all filtered out during normalisationIt's important to log and warn when this happens. >>> scorelog.sources ['ENSEMBL'] >>> scorelog.model_dump() # doctest: +ELLIPSIS {'header': {'pgs_id': 'PGS000001', ...}, 'compatible_effect_type': True, 'has_complex_alleles': True, 'pgs_id': 'PGS000001', 'is_harmonised': True, 'sources': ['ENSEMBL']} .. py:attribute:: compatible_effect_type :type: bool .. py:property:: has_complex_alleles :type: bool Do any variants contain complex alleles? e.g. HLA/APOE .. py:attribute:: header :type: ScoreHeader | CatalogScoreHeader .. py:property:: is_harmonised :type: bool .. py:attribute:: model_config .. py:property:: n_actual_variants :type: int | None .. py:property:: pgs_id :type: str | None .. py:property:: sources :type: list[str] | None .. py:property:: variant_count_difference :type: int | None .. py:attribute:: variant_logs :type: list[VariantLog] | None .. py:property:: variants_are_missing :type: bool .. py:class:: ScoreLogs A container of ScoreLog to simplify serialising to a JSON list .. py:attribute:: root :type: list[ScoreLog] .. py:class:: ScoreVariant This model includes attributes useful for processing and normalising variants >>> variant = ScoreVariant(**{"rsID": None, "chr_name": "1", "chr_position": 1, "effect_allele": "A", "effect_weight": 0.5, "row_nr": 0, "accession": "test"}) >>> variant # doctest: +ELLIPSIS ScoreVariant(rsID=None, chr_name='1', chr_position=1, effect_allele=Allele(allele='A', ... >>> variant.is_complex False >>> variant.is_non_additive False >>> variant.is_harmonised False >>> variant.effect_type EffectType.ADDITIVE >>> variant_missing_positions = ScoreVariant(**{"rsID": None, "chr_name": None, "chr_position": None, "effect_allele": "A", "effect_weight": 0.5, "row_nr": 0, "accession": "test"}) # doctest: +ELLIPSIS Traceback (most recent call last): ... pydantic_core._pydantic_core.ValidationError: 1 validation error for ScoreVariant Value error, Bad position: self.rsID=None, self.chr_name=None, self.chr_position=None... ... >>> harmonised_variant = ScoreVariant(**{"rsID": None, "chr_name": "1", "chr_position": 1, "effect_allele": "A", "effect_weight": 0.5, "hm_chr": "1", "hm_pos": 1, "hm_rsID": "rs1921", "hm_source": "ENSEMBL", "row_nr": 0, "accession": "test"}) >>> harmonised_variant.is_harmonised True >>> variant_nonadditive = ScoreVariant(**{"rsID": None, "chr_name": "1", "chr_position": 1, "effect_allele": "A", "dosage_0_weight": 0, "dosage_1_weight": 1, "dosage_2_weight": 0, "row_nr": 0, "accession": "test"}) >>> variant_nonadditive.is_non_additive True >>> variant_nonadditive.is_complex False >>> variant_nonadditive.effect_type EffectType.NONADDITIVE >>> variant_complex = ScoreVariant(**{"rsID": None, "chr_name": "1", "chr_position": 1, "effect_allele": "A", "effect_weight": 0.5, "is_haplotype": True, "row_nr": 0, "accession": "test"}) >>> variant_complex.is_complex True The harmonisation process might fail, so variants can be missing mandatory fields. This must be supported by the model: >>> bad_hm_variant = ScoreVariant(**{"rsID": "a_weird_rsid", "chr_name": None, "chr_position": None, "effect_allele": "G", "effect_weight": 0.5, "hm_chr": None, "hm_pos": None, "hm_rsID": None, "hm_source": "Unknown", "row_nr": 0, "accession": "test"}) >>> bad_hm_variant.is_harmonised True >>> bad_hm_variant.is_hm_bad True >>> harmonised_variant.is_hm_bad False rsID format validation (i.e. starts with rs, ss...) is disabled when harmonisation fails: >>> bad_hm_variant.rsID 'a_weird_rsid' .. py:attribute:: accession :type: Annotated[str, Field(title='Accession', description='Accession of score variant')] .. py:attribute:: is_duplicated :type: Annotated[bool | None, Field(default=False, title='Duplicated variant', description='In a list of variants with the same accession, is ID duplicated?')] .. py:attribute:: model_config .. py:attribute:: output_fields :type: ClassVar[tuple[str, Ellipsis]] :value: ('chr_name', 'chr_position', 'effect_allele', 'other_allele', 'effect_weight', 'effect_type',... .. py:attribute:: row_nr :type: Annotated[int, Field(title='Row number', description='Row number of variant in scoring file (first variant = 0)', ge=0)] .. py:class:: VariantLog This model consists of variant-level statistics we need to summarise in the ScoreLog Can't just reuse ScoreVariants because failed harmonisation can create invalid ScoreVariants (e.g. missing genomic coordinates) If ScoreLogs are composed of ScoreVariants then the data would be revalidated on instantiation and raise ValidationErrors Instead, just create VariantLogs from a subset of ScoreVariant fields .. py:attribute:: hm_source :type: str | None :value: None .. py:attribute:: is_complex :type: bool .. py:class:: VariantType Complex alleles are usually haplotypes/diplotypes and the gametic phase must be known to apply them accurately. See PGS Catalog Curation Guidelines: Appendix A – Special Cases for more information (Not supported by the calculator.) .. py:attribute:: APOE_ALLELE :value: 'APOE_allele' .. py:attribute:: CYP_ALLELE :value: 'CYP_allele' .. py:attribute:: HLA_AA :value: 'HLA_AA' .. py:attribute:: HLA_ALLELE :value: 'HLA_allele' .. py:attribute:: HLA_SEROTYPE :value: 'HLA_serotype'