core.lib.models
===============

.. py:module:: core.lib.models

.. autoapi-nested-parse::

   PGS Catalog pydantic models for data validation

   Best way to reuse:

     * `from pgscatalog.core import models` and use `models.CatalogScoreVariant(**d)`

     * `import pgscatalog.core` and use fully qualified name: `pgscatalog.core.models.CatalogScoreVariant`)


Classes
-------

.. autoapisummary::

   core.lib.models.Allele
   core.lib.models.CatalogScoreHeader
   core.lib.models.CatalogScoreVariant
   core.lib.models.ScoreFormatVersion
   core.lib.models.ScoreHeader
   core.lib.models.ScoreLog
   core.lib.models.ScoreLogs
   core.lib.models.ScoreVariant
   core.lib.models.VariantLog
   core.lib.models.VariantType


Module Contents
---------------

.. py:class:: Allele


   A class that represents an allele found in PGS Catalog scoring files

   >>> simple_ea = Allele(**{"allele": "A"})
   >>> simple_ea
   Allele(allele='A', is_snp=True)
   >>> str(simple_ea)
   'A'
   >>> Allele(**{"allele": "AG"})
   Allele(allele='AG', is_snp=True)
   >>> hla_example = Allele(**{"allele": "+"})
   >>> hla_example
   Allele(allele='+', is_snp=False)

   >>> Allele(allele="A")
   Allele(allele='A', is_snp=True)

   >>> Allele(allele="A/T").has_multiple_alleles
   True


   .. py:method:: serialize() -> str

      When dumping the model, flatten it to just return the allele as a string


   .. py:attribute:: allele
      :type:  str


   .. py:property:: has_multiple_alleles
      :type: bool


   .. py:property:: is_snp
      :type: bool


      SNPs are the most common type of effect allele in PGS Catalog scoring
      files. More complex effect alleles, like HLAs or APOE genes, often require
      extra work to represent in genomes. Users should be warned about complex
      effect alleles.


.. py:class:: CatalogScoreHeader


   A ScoreHeader that validates the PGS Catalog Scoring File header standard

   https://www.pgscatalog.org/downloads/#dl_ftp_scoring

   >>> from ._config import Config
   >>> testpath = Config.ROOT_DIR / "tests" / "data" / "PGS000001_hmPOS_GRCh38.txt.gz"
   >>> test = CatalogScoreHeader.from_path(testpath) # doctest: +ELLIPSIS
   >>> test # doctest: +ELLIPSIS
   CatalogScoreHeader(pgs_id='PGS000001', pgs_name='PRS77_BC', trait_reported='Breast cancer', genome_build=None, format_version=<ScoreFormatVersion.v2: '2.0'>, trait_mapped=['breast carcinoma'], trait_efo=['EFO_0000305'], variants_number=77, weight_type=None, pgp_id='PGP000001', citation='Mavaddat N et al. J Natl Cancer Inst (2015). doi:10.1093/jnci/djv036', HmPOS_build=GenomeBuild.GRCh38, HmPOS_date=datetime.date(2022, 7, 29), HmPOS_match_pos='{"True": null, "False": null}', HmPOS_match_chr='{"True": null, "False": null}')
   >>> test.variants_number == test.row_count
   True


   .. py:method:: check_format_version(version: ScoreFormatVersion) -> ScoreFormatVersion
      :classmethod:


   .. py:method:: check_pgp_id(pgp_id: str) -> str
      :classmethod:


   .. py:method:: check_pgs_id(pgs_id: str) -> str
      :classmethod:


   .. py:method:: parse_genome_build(value: str) -> pgscatalog.core.lib.genomebuild.GenomeBuild | None
      :classmethod:


   .. py:method:: parse_weight_type(value: str | None) -> str | None
      :classmethod:


   .. py:method:: serialize_genomebuild(genome_build: pgscatalog.core.lib.genomebuild.GenomeBuild | None, _info: pydantic.SerializationInfo) -> str


   .. py:method:: split_traits(trait: str) -> list[str]
      :classmethod:


   .. py:attribute:: HmPOS_build
      :type:  Annotated[pgscatalog.core.lib.genomebuild.GenomeBuild | None, Field(default=None)]


   .. py:attribute:: HmPOS_date
      :type:  Annotated[datetime.date | None, Field(default=None)]


   .. py:attribute:: HmPOS_match_chr
      :type:  Annotated[str | None, Field(default=None)]


   .. py:attribute:: HmPOS_match_pos
      :type:  Annotated[str | None, Field(default=None)]


   .. py:attribute:: citation
      :type:  str


   .. py:attribute:: format_version
      :type:  ScoreFormatVersion


   .. py:property:: is_harmonised
      :type: bool


   .. py:attribute:: license
      :type:  Annotated[str | None, Field('PGS obtained from the Catalog should be cited appropriately, and used in accordance with any licensing restrictions set by the authors. See EBI Terms of Use (https://www.ebi.ac.uk/about/terms-of-use/) for additional details.', repr=False)]


   .. py:attribute:: pgp_id
      :type:  str


   .. py:attribute:: trait_efo
      :type:  Annotated[list[str], Field(description="Ontology trait name, e.g. 'breast carcinoma")]


   .. py:attribute:: trait_mapped
      :type:  Annotated[list[str], Field(description='Trait name')]


   .. py:attribute:: variants_number
      :type:  Annotated[int, Field(gt=0, description='Number of variants listed in the PGS', default=None)]


   .. py:attribute:: weight_type
      :type:  Annotated[str | None, Field(description='Variant weight type', default=None)]


.. py:class:: CatalogScoreVariant


   A model representing a row from a PGS Catalog scoring file, defined here:

   https://www.pgscatalog.org/downloads/#scoring_columns

   Implementation notes:

       - You should instantiate effect weight fields with strings (e.g. with csv.reader, which returns data as a list of strings)

       - The model always handles effect weights internally as strings and will coerce numeric input to strings when instantiated

       - Our string obsession comes from a desire to faithfully reproduce author submitted data and avoid introducing precision errors

   Extra / dynamically named fields:

   Only one type of dynamic field is supported. Ancestry specific allele frequency information uses labels defined by authors.

   An example from the first row from PGS000662:

   >>> variant_with_allelefrequency = {"chr_name": "1", "chr_position": 5743196, "effect_allele": "T", "other_allele": "C", "effect_weight": 0.102298257, "allelefrequency_effect_European": 0.067, "allelefrequency_effect_African": 0.439, "allelefrequency_effect_Asian": 0.113, "allelefrequency_effect_Hispanic": 0.157}
   >>> CatalogScoreVariant(**variant_with_allelefrequency)  # doctest: +ELLIPSIS
   CatalogScoreVariant(rsID=None, chr_name='1', chr_position=5743196..., allelefrequency_effect_European=0.067, allelefrequency_effect_African=0.439, allelefrequency_effect_Asian=0.113, allelefrequency_effect_Hispanic=0.157, ...)

   An example from the first row from PGS000018 with the edited column name 'rsid':

   >>> variant_with_rsid_column = {"rsid": "rs2843152", "chr_name": 1, "chr_position": 2245570, "effect_allele": "G", "other_allele": "C", "effect_weight": -2.76009e-02}
   >>> CatalogScoreVariant(**variant_with_rsid_column)
   CatalogScoreVariant(rsID='rs2843152', chr_name='1', chr_position=2245570..., effect_weight='-0.0276009', ...)

   Extra field names which don't follow the pattern "allelefrequency_effect_{label}" will raise a ValueError:

   >>> bad_extra_fields = variant_with_allelefrequency | {"favourite_ice_cream": "vanilla"}
   >>> CatalogScoreVariant(**bad_extra_fields)
   Traceback (most recent call last):
   ...
   pydantic_core._pydantic_core.ValidationError: 1 validation error for CatalogScoreVariant
     Value error, Invalid extra fields detected: ['favourite_ice_cream'] ...

   Complex alleles are represented a little differently:

   >>> complex_allele = {"chr_name": 19, "effect_allele": "APOE_e2", "effect_weight": -0.5, "locus_name": "APOE", "is_haplotype": True, "variant_type": "APOE_allele", "variant_description": None}
   >>> CatalogScoreVariant(**complex_allele)
   CatalogScoreVariant(rsID=None, chr_name='19', chr_position=None, effect_allele=Allele(allele='APOE_e2', is_snp=False), other_allele=None, locus_name='APOE', is_haplotype=True, is_diplotype=False, imputation_method=None, variant_description=None, inclusion_criteria=None, effect_weight='-0.5', is_interaction=False, is_dominant=False, is_recessive=False, dosage_0_weight=None, dosage_1_weight=None, dosage_2_weight=None, OR=None, HR=None, allelefrequency_effect=None, hm_source=None, hm_rsID=None, hm_chr=None, hm_pos=None, hm_inferOtherAllele=None, hm_match_chr=None, hm_match_pos=None, variant_type=<VariantType.APOE_ALLELE: 'APOE_allele'>, variant_id='19::APOE_e2:', is_harmonised=False, is_complex=True, is_non_additive=False, effect_type=EffectType.ADDITIVE)

   Although effect weights are typed as optional, if all effect weight fields are missing then a model validator will raise a validation error:

   >>> CatalogScoreVariant(**{"chr_name": "19", "chr_position": 1, "effect_allele": "A", "effect_weight": None})  # doctest: +ELLIPSIS
   Traceback (most recent call last):
   ...
   pydantic_core._pydantic_core.ValidationError: 1 validation error for CatalogScoreVariant
     Value error, All effect weight fields are missing ...

   effect_weight can be missing if dosage_n_weight (non-additive) fields are all present

   However, dosage_n_weight fields must _all_ be present, if they're present:

   >>> CatalogScoreVariant(**{"chr_name": "19", "chr_position": 1, "effect_allele": "A", "dosage_0_weight": 0.1, "dosage_1_weight": None, "dosage_2_weight": None})  # doctest: +ELLIPSIS
   Traceback (most recent call last):
   ...
   pydantic_core._pydantic_core.ValidationError: 1 validation error for CatalogScoreVariant
     Value error, Dosage missing effect weight ...

   A variant may have all effect weight fields. During normalisation the standard effect_weight column will be used:

   >>> CatalogScoreVariant(**{"chr_name": "19", "chr_position": 1, "effect_allele": "A", "effect_weight": 0.05, "dosage_0_weight": 0, "dosage_1_weight": 0.1, "dosage_2_weight": 0.3})
   CatalogScoreVariant(rsID=None, chr_name='19', ..., is_non_additive=False, ...

   Note that is_non_additive is false if effect_weight column exists, although non-additive fields do exist.


   .. py:method:: alleles_must_parse(value: Any) -> Allele
      :classmethod:


   .. py:method:: check_complex_variants() -> CatalogScoreVariant


   .. py:method:: check_effect_weights() -> CatalogScoreVariant


   .. py:method:: check_extra_fields() -> CatalogScoreVariant

      Only allelefrequency_effect_{ancestry} is supported as an extra field
      {ancestry} is dynamic and set by submitters


   .. py:method:: check_position() -> CatalogScoreVariant


   .. py:method:: effect_weight_must_float(weight: str | None) -> str | None
      :classmethod:


   .. py:method:: empty_string_to_none(v: Any) -> Any | None
      :classmethod:


   .. py:method:: set_missing_rsid(rsid: str | None) -> str | None
      :classmethod:


   .. py:attribute:: HR
      :type:  Annotated[float | None, Field(default=None, title='Hazard Ratio', description='Author-reported effect sizes can be supplied to the Catalog. If no other effect_weight is given the weight is calculated using the log(OR) or log(HR).')]


   .. py:attribute:: OR
      :type:  Annotated[float | None, Field(default=None, title='Odds Ratio', description='Author-reported effect sizes can be supplied to the Catalog. If no other effect_weight is given the weight is calculated using the log(OR) or log(HR).')]


   .. py:attribute:: allelefrequency_effect
      :type:  Annotated[float | None, Field(default=None, title='Effect Allele Frequency', description='Reported effect allele frequency, if the associated locus is a haplotype then haplotype frequency will be extracted.', ge=0)]


   .. py:attribute:: chr_name
      :type:  Annotated[str | None, Field(default=None, title='Location - Chromosome ', description='Chromosome name/number associated with the variant.', coerce_numbers_to_str=True)]


   .. py:attribute:: chr_position
      :type:  Annotated[int | None, Field(default=None, title='Location within the Chromosome', description='Chromosomal position associated with the variant.', gt=0)]


   .. py:attribute:: complex_columns
      :type:  ClassVar[tuple[str, str, str]]
      :value: ('is_haplotype', 'is_diplotype', 'is_interaction')


   .. py:attribute:: dosage_0_weight
      :type:  Annotated[str | None, Field(default=None, title='Effect weight with 0 copy of the effect allele', description='Weights that are specific to different dosages of the effect_allele (e.g. {0, 1, 2} copies) can also be reported when the the contribution of the variants to the score is not encoded as additive, dominant, or recessive. In this case three columns are added corresponding to which variant weight should be applied for each dosage, where the column name is formated as dosage_#_weight where the # sign indicates the number of effect_allele copies.', coerce_numbers_to_str=True)]


   .. py:attribute:: dosage_1_weight
      :type:  Annotated[str | None, Field(default=None, title='Effect weight with 1 copy of the effect allele', description='Weights that are specific to different dosages of the effect_allele (e.g. {0, 1, 2} copies) can also be reported when the the contribution of the variants to the score is not encoded as additive, dominant, or recessive. In this case three columns are added corresponding to which variant weight should be applied for each dosage, where the column name is formated as dosage_#_weight where the # sign indicates the number of effect_allele copies.', coerce_numbers_to_str=True)]


   .. py:attribute:: dosage_2_weight
      :type:  Annotated[str | None, Field(default=None, title='Effect weight with 2 copies of the effect allele', description='Weights that are specific to different dosages of the effect_allele (e.g. {0, 1, 2} copies) can also be reported when the the contribution of the variants to the score is not encoded as additive, dominant, or recessive. In this case three columns are added corresponding to which variant weight should be applied for each dosage, where the column name is formated as dosage_#_weight where the # sign indicates the number of effect_allele copies.', coerce_numbers_to_str=True)]


   .. py:attribute:: effect_allele
      :type:  Annotated[Allele | None, Field(default=None, title='Effect Allele', description="The allele that's dosage is counted (e.g. {0, 1, 2}) and multiplied by the variant's weight (effect_weight) when calculating score. The effect allele is also known as the 'risk allele'. Note: this does not necessarily need to correspond to the minor allele/alternative allele.")]


   .. py:property:: effect_type
      :type: pgscatalog.core.lib.effecttype.EffectType


   .. py:attribute:: effect_weight
      :type:  Annotated[str | None, Field(default=None, title='Variant Weight', description='Value of the effect that is multiplied by the dosage of the effect allele (effect_allele) when calculating the score. Additional information on how the effect_weight was derived is in the weight_type field of the header, and score development method in the metadata downloads.', coerce_numbers_to_str=True)]


   .. py:attribute:: harmonised_columns
      :type:  ClassVar[tuple[str, str, str, str]]
      :value: ('hm_source', 'hm_rsID', 'hm_chr', 'hm_pos')


   .. py:attribute:: hm_chr
      :type:  Annotated[str | None, Field(default=None, title='Harmonized chromosome name', description='Chromosome that the harmonized variant is present on, preferring matches to chromosomes over patches present in later builds.')]


   .. py:attribute:: hm_inferOtherAllele
      :type:  Annotated[Allele | None, Field(default=None, title='Harmonized other alleles', description='If only the effect_allele is given we attempt to infer the non-effect/other allele(s) using Ensembl/dbSNP alleles.')]


   .. py:attribute:: hm_match_chr
      :type:  Annotated[bool | None, Field(default=None, title='FLAG: matching chromosome name', description='Used for QC. Only provided if the scoring file is being harmonized to the same genome build, and where the chromosome name is provided in the column chr_name.')]


   .. py:attribute:: hm_match_pos
      :type:  Annotated[bool | None, Field(default=None, title='FLAG: matching chromosome position', description='Used for QC. Only provided if the scoring file is being harmonized to the same genome build, and where the chromosome name is provided in the column chr_position.')]


   .. py:attribute:: hm_pos
      :type:  Annotated[int | None, Field(ge=0, default=None, title='Harmonized chromosome position', description='Chromosomal position (base pair location) where the variant is located, preferring matches to chromosomes over patches present in later builds.')]


   .. py:attribute:: hm_rsID
      :type:  Annotated[str | None, Field(default=None, title='Harmonized rsID', description='Current rsID. Differences between this column and the author-reported column (rsID) indicate variant merges and annotation updates from dbSNP.')]


   .. py:attribute:: hm_source
      :type:  Annotated[str | None, Field(default=None, title='Provider of the harmonized variant information', description='Data source of the variant position. Options include: ENSEMBL, liftover, author-reported (if being harmonized to the same build).')]


   .. py:attribute:: imputation_method
      :type:  Annotated[str | None, Field(default=None, title='Imputation Method', description='This described whether the variant was specifically called with a specific imputation or variant calling method. This is mostly kept to describe HLA-genotyping methods (e.g. flag SNP2HLA, HLA*IMP) that gives alleles that are not referenced by genomic position.')]


   .. py:attribute:: inclusion_criteria
      :type:  Annotated[str | None, Field(default=None, title='Score Inclusion Criteria', description='Explanation of when this variant gets included into the PGS (e.g. if it depends on the results from other variants).')]


   .. py:property:: is_complex
      :type: bool


   .. py:attribute:: is_diplotype
      :type:  Annotated[bool | None, Field(default=False, title='FLAG: Diplotype', description='This is a TRUE/FALSE variable that flags whether the effect allele is a haplotype/diplotype rather than a single SNP. Constituent SNPs in the haplotype are semi-colon separated.')]


   .. py:attribute:: is_dominant
      :type:  Annotated[bool | None, Field(default=False, title='FLAG: Dominant Inheritance Model', description='This is a TRUE/FALSE variable that flags whether the weight should be added to the PGS sum if there is at least 1 copy of the effect allele (e.g. it is a dominant allele).')]


   .. py:attribute:: is_haplotype
      :type:  Annotated[bool | None, Field(default=False, title='FLAG: Haplotype', description='This is a TRUE/FALSE variable that flags whether the effect allele is a haplotype/diplotype rather than a single SNP. Constituent SNPs in the haplotype are semi-colon separated.')]


   .. py:property:: is_harmonised
      :type: bool


   .. py:property:: is_hm_bad
      :type: bool


      Was harmonisation OK?


   .. py:attribute:: is_interaction
      :type:  Annotated[bool | None, Field(default=False, title='FLAG: Interaction', description='This is a TRUE/FALSE variable that flags whether the weight should be multiplied with the dosage of more than one variant. Interactions are demarcated with a _x_ between entries for each of the variants present in the interaction.')]


   .. py:property:: is_non_additive
      :type: bool


   .. py:attribute:: is_recessive
      :type:  Annotated[bool | None, Field(default=False, title='FLAG: Recessive Inheritance Model', description='This is a TRUE/FALSE variable that flags whether the weight should be added to the PGS sum only if there are 2 copies of the effect allele (e.g. it is a recessive allele).')]


   .. py:attribute:: locus_name
      :type:  Annotated[str | None, Field(default=None, title='Locus Name', description='This is kept in for loci where the variant may be referenced by the gene (APOE e4). It is also common (usually in smaller PGS) to see the variants named according to the genes they impact.')]


   .. py:attribute:: model_config


   .. py:attribute:: non_additive_columns
      :type:  ClassVar[tuple[str, str, str]]
      :value: ('dosage_0_weight', 'dosage_1_weight', 'dosage_2_weight')


   .. py:attribute:: other_allele
      :type:  Annotated[Allele | None, Field(default=None, title='Other allele(s)', description='The other allele(s) at the loci. Note: this does not necessarily need to correspond to the reference allele.')]


   .. py:attribute:: rsID
      :type:  Annotated[str | None, Field(default=None, validation_alias=AliasChoices('rsID', 'rsid'), title='dbSNP Accession ID (rsID)', description='The SNP’s rsID. This column also contains HLA alleles in the standard notation (e.g. HLA-DQA1*0102) that aren’t always provided with chromosomal positions.')]


   .. py:attribute:: variant_description
      :type:  Annotated[str | None, Field(default=None, title='Variant Description', description='This field describes any extra information about the variant (e.g. how it is genotyped or scored) that cannot be captured by the other fields.')]


   .. py:property:: variant_id
      :type: str


      ID = chr:pos:effect_allele:other_allele


   .. py:attribute:: variant_type
      :type:  Annotated[VariantType | None, Field(default=None, title='Complex alleles only: how is the variant name formatted?')]


.. py:class:: ScoreFormatVersion


   See https://www.pgscatalog.org/downloads/#scoring_changes
   v1 was deprecated in December 2021


   .. py:attribute:: v2
      :value: '2.0'


.. py:class:: ScoreHeader


   Headers store useful metadata about a scoring file.

   Data validation is less strict than the CatalogScoreHeader, to make
   it easier for people to use custom scoring files with the PGS Catalog Calculator.

   >>> ScoreHeader(**{"pgs_id": "PGS123456", "trait_reported": "testtrait", "genome_build": "GRCh38"})
   ScoreHeader(pgs_id='PGS123456', pgs_name=None, trait_reported='testtrait', genome_build=GenomeBuild.GRCh38)

   >>> ScoreHeader(**{"omicspred_id": "OPGS123456", "trait_reported": "testtrait", "genome_build": "GRCh38"})
   ScoreHeader(pgs_id='OPGS123456', pgs_name=None, trait_reported='testtrait', genome_build=GenomeBuild.GRCh38)

   >>> ScoreHeader(**{"score_id": "SC1234B", "trait_reported": "testtrait", "genome_build": "GRCh37"})
   ScoreHeader(pgs_id='SC1234B', pgs_name=None, trait_reported='testtrait', genome_build=GenomeBuild.GRCh37)

   >>> from ._config import Config
   >>> testpath = Config.ROOT_DIR / "tests" / "data" / "PGS000001_hmPOS_GRCh38.txt.gz"
   >>> ScoreHeader.from_path(testpath).row_count
   77

   >>> from ._config import Config
   >>> testpath = Config.ROOT_DIR / "tests" / "data" / "OPGS002493.txt.gz"
   >>> test = ScoreHeader.from_path(testpath) # doctest


   .. py:method:: from_path(path: str | pathlib.Path) -> Self
      :classmethod:


   .. py:method:: parse_genome_build(value: str) -> pgscatalog.core.lib.genomebuild.GenomeBuild | None
      :classmethod:


   .. py:method:: serialize_genomebuild(genome_build: pgscatalog.core.lib.genomebuild.GenomeBuild, _info: pydantic.SerializationInfo) -> str


   .. py:attribute:: genome_build
      :type:  Annotated[pgscatalog.core.lib.genomebuild.GenomeBuild | None, Field(description='Genome build')]


   .. py:property:: is_harmonised
      :type: bool


   .. py:attribute:: pgs_id
      :type:  Annotated[str | None, Field(title='PGS identifier', validation_alias=AliasChoices('pgs_id', 'omicspred_id', 'score_id'))]


   .. py:attribute:: pgs_name
      :type:  Annotated[str | None, Field(description='PGS name', default=None)]


   .. py:property:: row_count
      :type: int


      Calculate the number of variants in the scoring file by counting the number of rows


   .. py:attribute:: trait_reported
      :type:  Annotated[str, Field(description='Trait name')]


.. py:class:: ScoreLog


   A log that includes header information and variant summary statistics

   >>> header = CatalogScoreHeader(pgs_id='PGS000001', pgs_name='PRS77_BC', trait_reported='Breast cancer', genome_build=None, format_version=ScoreFormatVersion.v2, trait_mapped='breast carcinoma', trait_efo='EFO_0000305', variants_number=77, weight_type="NR", pgp_id='PGP000001', citation='Mavaddat N et al. J Natl Cancer Inst (2015). doi:10.1093/jnci/djv036', HmPOS_build="GRCh38", HmPOS_date="2022-07-29")
   >>> harmonised_variant = ScoreVariant(**{"rsID": None, "chr_name": "1", "chr_position": 1, "effect_allele": "HLA-DQ", "effect_weight": 0.5, "hm_chr": "1", "hm_pos": 1, "hm_rsID": "rs1921", "hm_source": "ENSEMBL",  "row_nr": 0, "accession": "test"})
   >>> variant_log = harmonised_variant.model_dump(include={"hm_source", "is_complex"})
   >>> scorelog = ScoreLog(header=header, compatible_effect_type=True, variant_logs=[VariantLog(**variant_log)])  # doctest: +ELLIPSIS
   >>> scorelog
   ScoreLog(header=CatalogScoreHeader(...), compatible_effect_type=True, has_complex_alleles=True, pgs_id='PGS000001', is_harmonised=True, sources=['ENSEMBL'])

   In the original scoring file header there were 77 variants:

   >>> scorelog.header.variants_number
   77

   But we've only got 1 ScoreVariant:

   >>> scorelog.n_actual_variants
   1
   >>> scorelog.variant_count_difference
   76
   >>> scorelog.variants_are_missing
   True

   Maybe they were all filtered out during normalisationIt's important to log and warn when this happens.

   >>> scorelog.sources
   ['ENSEMBL']

   >>> scorelog.model_dump()  # doctest: +ELLIPSIS
   {'header': {'pgs_id': 'PGS000001', ...}, 'compatible_effect_type': True, 'has_complex_alleles': True, 'pgs_id': 'PGS000001', 'is_harmonised': True, 'sources': ['ENSEMBL']}


   .. py:attribute:: compatible_effect_type
      :type:  bool


   .. py:property:: has_complex_alleles
      :type: bool


      Do any variants contain complex alleles? e.g. HLA/APOE


   .. py:attribute:: header
      :type:  ScoreHeader | CatalogScoreHeader


   .. py:property:: is_harmonised
      :type: bool


   .. py:attribute:: model_config


   .. py:property:: n_actual_variants
      :type: int | None


   .. py:property:: pgs_id
      :type: str | None


   .. py:property:: sources
      :type: list[str] | None


   .. py:property:: variant_count_difference
      :type: int | None


   .. py:attribute:: variant_logs
      :type:  list[VariantLog] | None


   .. py:property:: variants_are_missing
      :type: bool


.. py:class:: ScoreLogs


   A container of ScoreLog to simplify serialising to a JSON list


   .. py:attribute:: root
      :type:  list[ScoreLog]


.. py:class:: ScoreVariant


   This model includes attributes useful for processing and normalising variants

   >>> variant = ScoreVariant(**{"rsID": None, "chr_name": "1", "chr_position": 1, "effect_allele": "A", "effect_weight": 0.5, "row_nr": 0, "accession": "test"})
   >>> variant  # doctest: +ELLIPSIS
   ScoreVariant(rsID=None, chr_name='1', chr_position=1, effect_allele=Allele(allele='A', ...
   >>> variant.is_complex
   False
   >>> variant.is_non_additive
   False
   >>> variant.is_harmonised
   False
   >>> variant.effect_type
   EffectType.ADDITIVE

   >>> variant_missing_positions = ScoreVariant(**{"rsID": None, "chr_name": None, "chr_position": None, "effect_allele": "A", "effect_weight": 0.5,  "row_nr": 0, "accession": "test"}) # doctest: +ELLIPSIS
   Traceback (most recent call last):
   ...
   pydantic_core._pydantic_core.ValidationError: 1 validation error for ScoreVariant
     Value error, Bad position: self.rsID=None, self.chr_name=None, self.chr_position=None...
     ...

   >>> harmonised_variant = ScoreVariant(**{"rsID": None, "chr_name": "1", "chr_position": 1, "effect_allele": "A", "effect_weight": 0.5, "hm_chr": "1", "hm_pos": 1, "hm_rsID": "rs1921", "hm_source": "ENSEMBL",  "row_nr": 0, "accession": "test"})
   >>> harmonised_variant.is_harmonised
   True

   >>> variant_nonadditive = ScoreVariant(**{"rsID": None, "chr_name": "1", "chr_position": 1, "effect_allele": "A", "dosage_0_weight": 0, "dosage_1_weight": 1,  "dosage_2_weight": 0, "row_nr": 0, "accession": "test"})
   >>> variant_nonadditive.is_non_additive
   True
   >>> variant_nonadditive.is_complex
   False
   >>> variant_nonadditive.effect_type
   EffectType.NONADDITIVE

   >>> variant_complex = ScoreVariant(**{"rsID": None, "chr_name": "1", "chr_position": 1, "effect_allele": "A", "effect_weight": 0.5, "is_haplotype": True,  "row_nr": 0, "accession": "test"})
   >>> variant_complex.is_complex
   True

   The harmonisation process might fail, so variants can be missing mandatory fields.

   This must be supported by the model:

   >>> bad_hm_variant = ScoreVariant(**{"rsID": "a_weird_rsid", "chr_name": None, "chr_position": None, "effect_allele": "G", "effect_weight": 0.5, "hm_chr": None, "hm_pos": None, "hm_rsID": None, "hm_source": "Unknown",  "row_nr": 0, "accession": "test"})
   >>> bad_hm_variant.is_harmonised
   True
   >>> bad_hm_variant.is_hm_bad
   True
   >>> harmonised_variant.is_hm_bad
   False

   rsID format validation (i.e. starts with rs, ss...) is disabled when harmonisation fails:

   >>> bad_hm_variant.rsID
   'a_weird_rsid'


   .. py:attribute:: accession
      :type:  Annotated[str, Field(title='Accession', description='Accession of score variant')]


   .. py:attribute:: is_duplicated
      :type:  Annotated[bool | None, Field(default=False, title='Duplicated variant', description='In a list of variants with the same accession, is ID duplicated?')]


   .. py:attribute:: model_config


   .. py:attribute:: output_fields
      :type:  ClassVar[tuple[str, Ellipsis]]
      :value: ('chr_name', 'chr_position', 'effect_allele', 'other_allele', 'effect_weight', 'effect_type',...


   .. py:attribute:: row_nr
      :type:  Annotated[int, Field(title='Row number', description='Row number of variant in scoring file (first variant = 0)', ge=0)]


.. py:class:: VariantLog


   This model consists of variant-level statistics we need to summarise in the ScoreLog

   Can't just reuse ScoreVariants because failed harmonisation can create invalid ScoreVariants (e.g. missing genomic coordinates)

   If ScoreLogs are composed of ScoreVariants then the data would be revalidated on instantiation and raise ValidationErrors

   Instead, just create VariantLogs from a subset of ScoreVariant fields


   .. py:attribute:: hm_source
      :type:  str | None
      :value: None


   .. py:attribute:: is_complex
      :type:  bool


.. py:class:: VariantType


   Complex alleles are usually haplotypes/diplotypes and the gametic phase must be known to apply them accurately.

   See PGS Catalog Curation Guidelines: Appendix A – Special Cases for more information

   (Not supported by the calculator.)


   .. py:attribute:: APOE_ALLELE
      :value: 'APOE_allele'


   .. py:attribute:: CYP_ALLELE
      :value: 'CYP_allele'


   .. py:attribute:: HLA_AA
      :value: 'HLA_AA'


   .. py:attribute:: HLA_ALLELE
      :value: 'HLA_allele'


   .. py:attribute:: HLA_SEROTYPE
      :value: 'HLA_serotype'