calc.lib.legacy.principalcomponents¶
Attributes¶
Classes¶
PGS can be calculated on a reference panel or target population. |
|
This class represents principal components analysis (PCA) data calculated by |
Module Contents¶
- class calc.lib.legacy.principalcomponents.PopulationType(*args, **kwds)¶
PGS can be calculated on a reference panel or target population.
This enum mostly helps to disambiguate instances of
PrincipalComponents.- REFERENCE = 'reference'¶
- TARGET = 'target'¶
- class calc.lib.legacy.principalcomponents.PrincipalComponents(pcs_path, dataset, pop_type, psam_path=None, related_path=None, **kwargs)¶
This class represents principal components analysis (PCA) data calculated by
fraposa-pgsc.PCA data may come from a reference population or a target population. Target populations have been projected onto the reference population.
>>> from ._config import Config >>> related_path = Config.ROOT_DIR / "tests" / "legacy" / "data" / "ref.king.cutoff.id" >>> psam_path = Config.ROOT_DIR / "tests" / "legacy" / "data" / "ref.psam" >>> ref_pc = PrincipalComponents(pcs_path=[Config.ROOT_DIR / "tests" / "legacy" / "data" / "ref.pcs"], dataset="reference", psam_path=psam_path, related_path=related_path, pop_type=PopulationType.REFERENCE) >>> ref_pc PrincipalComponents(dataset='reference', pop_type=PopulationType.REFERENCE, pcs_path=[PosixPath('.../ref.pcs')], psam_path=PosixPath('.../ref.psam')) >>> ref_pc.df.to_dict() {'PC1': {('reference', 'HG00096', 'HG00096'): -23.8212, ('reference', 'HG00097', 'HG00097'): -24.8106, ... >>> target_pcs = PrincipalComponents(pcs_path=Config.ROOT_DIR / "tests" / "legacy" / "data" / "target.pcs", dataset="target", pop_type=PopulationType.TARGET) >>> target_pcs PrincipalComponents(dataset='target', pop_type=PopulationType.TARGET, pcs_path=[PosixPath('.../target.pcs')], psam_path=None) >>> target_pcs.df.to_dict() {'PC1': {('target', 'HGDP00001', 'HGDP00001'): -18.5135, ('target', 'HGDP00003', 'HGDP00003'): -18.8314, ...
- dataset¶
- property df¶
A pandas dataframe that contains PCA data.
Reference data also contains population label columns loaded from sample information files.
- Raises:
ValueError – If the reference population consists of fewer than 100 samples
- property max_pcs¶
The maximum number of PCs used in calculations
- property npcs_norm¶
Number of PCs used for population normalization (default = 4)
- property npcs_popcomp¶
Number of PCs used for population comparison (default = 5)
- property pop_type¶
See
PopulationType
- property poplabel¶
The group label used to assign target samples that are similar to reference population groups, e.g. SAS/EUR/AFR
- property psam_path¶
Path to a plink2 sample information file for the reference population
Path to a plink2 kinship cutoff file
Related reference samples are removed from analysis
- calc.lib.legacy.principalcomponents.logger¶