calc.lib.legacy.principalcomponents =================================== .. py:module:: calc.lib.legacy.principalcomponents Attributes ---------- .. autoapisummary:: calc.lib.legacy.principalcomponents.logger Classes ------- .. autoapisummary:: calc.lib.legacy.principalcomponents.PopulationType calc.lib.legacy.principalcomponents.PrincipalComponents Module Contents --------------- .. py:class:: PopulationType(*args, **kwds) PGS can be calculated on a reference panel or target population. This enum mostly helps to disambiguate instances of :class:`PrincipalComponents`. .. py:attribute:: REFERENCE :value: 'reference' .. py:attribute:: TARGET :value: 'target' .. py:class:: PrincipalComponents(pcs_path, dataset, pop_type, psam_path=None, related_path=None, **kwargs) This class represents principal components analysis (PCA) data calculated by ``fraposa-pgsc``. PCA data may come from a reference population or a target population. Target populations have been projected onto the reference population. >>> from ._config import Config >>> related_path = Config.ROOT_DIR / "tests" / "legacy" / "data" / "ref.king.cutoff.id" >>> psam_path = Config.ROOT_DIR / "tests" / "legacy" / "data" / "ref.psam" >>> ref_pc = PrincipalComponents(pcs_path=[Config.ROOT_DIR / "tests" / "legacy" / "data" / "ref.pcs"], dataset="reference", psam_path=psam_path, related_path=related_path, pop_type=PopulationType.REFERENCE) >>> ref_pc PrincipalComponents(dataset='reference', pop_type=PopulationType.REFERENCE, pcs_path=[PosixPath('.../ref.pcs')], psam_path=PosixPath('.../ref.psam')) >>> ref_pc.df.to_dict() {'PC1': {('reference', 'HG00096', 'HG00096'): -23.8212, ('reference', 'HG00097', 'HG00097'): -24.8106, ... >>> target_pcs = PrincipalComponents(pcs_path=Config.ROOT_DIR / "tests" / "legacy" / "data" / "target.pcs", dataset="target", pop_type=PopulationType.TARGET) >>> target_pcs PrincipalComponents(dataset='target', pop_type=PopulationType.TARGET, pcs_path=[PosixPath('.../target.pcs')], psam_path=None) >>> target_pcs.df.to_dict() {'PC1': {('target', 'HGDP00001', 'HGDP00001'): -18.5135, ('target', 'HGDP00003', 'HGDP00003'): -18.8314, ... .. py:attribute:: dataset .. py:property:: df A pandas dataframe that contains PCA data. Reference data also contains population label columns loaded from sample information files. :raises ValueError: If the reference population consists of fewer than 100 samples .. py:property:: max_pcs The maximum number of PCs used in calculations .. py:property:: npcs_norm Number of PCs used for population normalization (default = 4) .. py:property:: npcs_popcomp Number of PCs used for population comparison (default = 5) .. py:property:: pop_type See :class:`PopulationType` .. py:property:: poplabel The group label used to assign target samples that are similar to reference population groups, e.g. SAS/EUR/AFR .. py:property:: psam_path Path to a plink2 sample information file for the reference population .. py:property:: related_path Path to a plink2 kinship cutoff file Related reference samples are removed from analysis .. py:data:: logger