calc.lib.legacy.principalcomponents
===================================

.. py:module:: calc.lib.legacy.principalcomponents


Attributes
----------

.. autoapisummary::

   calc.lib.legacy.principalcomponents.logger


Classes
-------

.. autoapisummary::

   calc.lib.legacy.principalcomponents.PopulationType
   calc.lib.legacy.principalcomponents.PrincipalComponents


Module Contents
---------------

.. py:class:: PopulationType(*args, **kwds)


   PGS can be calculated on a reference panel or target population.

   This enum mostly helps to disambiguate instances of :class:`PrincipalComponents`.


   .. py:attribute:: REFERENCE
      :value: 'reference'


   .. py:attribute:: TARGET
      :value: 'target'


.. py:class:: PrincipalComponents(pcs_path, dataset, pop_type, psam_path=None, related_path=None, **kwargs)

   This class represents principal components analysis (PCA) data calculated by ``fraposa-pgsc``.

   PCA data may come from a reference population or a target population. Target populations have been projected onto the reference population.

   >>> from ._config import Config
   >>> related_path = Config.ROOT_DIR / "tests" / "legacy" / "data" / "ref.king.cutoff.id"
   >>> psam_path = Config.ROOT_DIR / "tests" / "legacy" / "data" / "ref.psam"
   >>> ref_pc = PrincipalComponents(pcs_path=[Config.ROOT_DIR / "tests" / "legacy" / "data" / "ref.pcs"], dataset="reference", psam_path=psam_path, related_path=related_path, pop_type=PopulationType.REFERENCE)
   >>> ref_pc
   PrincipalComponents(dataset='reference', pop_type=PopulationType.REFERENCE, pcs_path=[PosixPath('.../ref.pcs')], psam_path=PosixPath('.../ref.psam'))
   >>> ref_pc.df.to_dict()
   {'PC1': {('reference', 'HG00096', 'HG00096'): -23.8212, ('reference', 'HG00097', 'HG00097'): -24.8106, ...
   >>> target_pcs = PrincipalComponents(pcs_path=Config.ROOT_DIR / "tests" / "legacy" / "data" / "target.pcs", dataset="target", pop_type=PopulationType.TARGET)
   >>> target_pcs
   PrincipalComponents(dataset='target', pop_type=PopulationType.TARGET, pcs_path=[PosixPath('.../target.pcs')], psam_path=None)
   >>> target_pcs.df.to_dict()
   {'PC1': {('target', 'HGDP00001', 'HGDP00001'): -18.5135, ('target', 'HGDP00003', 'HGDP00003'): -18.8314, ...


   .. py:attribute:: dataset


   .. py:property:: df

      A pandas dataframe that contains PCA data.

      Reference data also contains population label columns loaded from sample information files.

      :raises ValueError: If the reference population consists of fewer than 100 samples


   .. py:property:: max_pcs

      The maximum number of PCs used in calculations


   .. py:property:: npcs_norm

      Number of PCs used for population normalization (default = 4)


   .. py:property:: npcs_popcomp

      Number of PCs used for population comparison (default = 5)


   .. py:property:: pop_type

      See :class:`PopulationType`


   .. py:property:: poplabel

      The group label used to assign target samples that are similar to reference population groups, e.g. SAS/EUR/AFR


   .. py:property:: psam_path

      Path to a plink2 sample information file for the reference population


   .. py:property:: related_path

      Path to a plink2 kinship cutoff file

      Related reference samples are removed from analysis


.. py:data:: logger