calc.lib.legacy.polygenicscore¶
Attributes¶
Classes¶
Arguments that control genetic similarity estimation and PGS adjustment |
|
Results returned by |
|
A PGS that's been aggregated, melted, and probably contains samples from a reference panel and a target population. |
|
Represents the output of |
Module Contents¶
- class calc.lib.legacy.polygenicscore.AdjustArguments¶
Arguments that control genetic similarity estimation and PGS adjustment
>>> AdjustArguments(method_compare="Mahalanobis", pThreshold=None, method_normalization=("empirical", "mean")) AdjustArguments(method_compare='Mahalanobis', pThreshold=None, method_normalization=('empirical', 'mean'))
- method_compare: str = 'RandomForest'¶
- method_normalization: tuple[str, Ellipsis] = ('empirical', 'mean', 'mean+var')¶
- pThreshold: float | None = None¶
- class calc.lib.legacy.polygenicscore.AdjustResults¶
Results returned by
AggregatedPGS.adjust()- write(directory)¶
Write model, PGS, and PCA data to a directory
- model_meta: dict¶
- models: pandas.DataFrame¶
- pca: pandas.DataFrame¶
- pgs: pandas.DataFrame¶
- scorecols: list[str]¶
- target_label: str¶
- class calc.lib.legacy.polygenicscore.AggregatedPGS(*, target_name, df=None, path=None)¶
A PGS that’s been aggregated, melted, and probably contains samples from a reference panel and a target population.
The most useful method in this class adjusts PGS based on
genetic ancestry similarity estimation.>>> from ._config import Config >>> score_path = Config.ROOT_DIR / "tests" / "legacy" / "data" / "aggregated_scores.txt.gz" >>> AggregatedPGS(path=score_path, target_name="hgdp") AggregatedPGS(path=PosixPath('.../aggregated_scores.txt.gz'))
- adjust(*, ref_pc, target_pc, adjust_arguments=None)¶
Adjust a PGS based on genetic ancestry similarity estimations.
- Returns:
>>> from ._config import Config >>> from .principalcomponents import PrincipalComponents >>> related_path = Config.ROOT_DIR / "tests" / "legacy" / "data" / "ref.king.cutoff.id" >>> ref_pc = PrincipalComponents(pcs_path=[Config.ROOT_DIR / "tests" / "legacy" /"data" / "ref.pcs"], dataset="reference", psam_path=Config.ROOT_DIR / "tests" / "legacy" /"data" / "ref.psam", pop_type=PopulationType.REFERENCE, related_path=related_path) >>> target_pcs = PrincipalComponents(pcs_path=Config.ROOT_DIR / "tests" / "legacy" / "data" / "target.pcs", dataset="target", pop_type=PopulationType.TARGET) >>> score_path = Config.ROOT_DIR / "tests" / "legacy" / "data" / "aggregated_scores.txt" >>> results = AggregatedPGS(path=score_path, target_name="hgdp").adjust(ref_pc=ref_pc, target_pc=target_pcs) >>> results.pgs.to_dict().keys() dict_keys(['SUM|PGS001229_hmPOS_GRCh38', 'percentile_MostSimilarPop|PGS001229_hmPOS_GRCh38', 'Z_MostSimilarPop|PGS001229_hmPOS_GRCh38', ...
>>> results.models {'dist_empirical': {'PGS001229_hmPOS_GRCh38': {'EUR': {'percentiles': array([-1.04069000e+01, -7.94665080e+00, ...
Write the adjusted results to a directory:
>>> import tempfile, os >>> dout = tempfile.mkdtemp() >>> results.write(directory=dout) >>> sorted(os.listdir(dout)) ['target_info.json.gz', 'target_pgs.txt.gz', 'target_popsimilarity.txt.gz']
- property df¶
- property path¶
- property target_name¶
- class calc.lib.legacy.polygenicscore.PolygenicScore(*, path=None, df=None, sampleset=None)¶
Represents the output of
plink2 --scorewritten to a file>>> from ._config import Config >>> import reprlib >>> score1 = Config.ROOT_DIR / "tests" / "legacy" / "data" / "cineca_22_additive_0.sscore.zst" >>> pgs1 = PolygenicScore(sampleset="test", path=score1) >>> pgs1 PolygenicScore(sampleset='test', path=PosixPath('.../cineca_22_additive_0.sscore.zst')) >>> pgs2 = PolygenicScore(sampleset="test", path=score1) >>> reprlib.repr(pgs1.read().to_dict()) "{'DENOM': {('test', 'HG00096', 'HG00096'): 1564, ... 'PGS001229_22_SUM': {('test', 'HG00096', 'HG00096'): 0.54502, ...
It’s often helpful to combine PGS that were split per chromosome or by effect type:
>>> aggregated_score = pgs1 + pgs2 >>> aggregated_score PolygenicScore(sampleset='test', path='(in-memory)')
Once a score has been fully aggregated it can be helpful to recalculate an average:
>>> aggregated_score.average() >>> aggregated_score.df PGS SUM DENOM AVG sampleset FID IID test HG00096 HG00096 PGS001229_22 1.090040 3128 0.000348 HG00097 HG00097 PGS001229_22 1.348802 3128 0.000431 ...
Scores can be written to a TSV file:
>>> import tempfile, os >>> outd = tempfile.mkdtemp() >>> aggregated_score.write(str(outd)) >>> os.listdir(outd) ['aggregated_scores.txt.gz']
With support for splitting output files by sampleset:
>>> splitoutd = tempfile.mkdtemp() >>> aggregated_score.write(splitoutd, split=True) >>> sorted(os.listdir(splitoutd), key = lambda x: x.split("_")[0]) ['test_pgs.txt.gz']
If a sampleset can’t be inferred from argument or path, error: >>> PolygenicScore() Traceback (most recent call last): … TypeError: Missing sampleset
- average()¶
Update the dataframe with a recalculated average.
- melt()¶
Update the dataframe with a melted version (wide format to long format)
- read()¶
Eagerly load a PGS into a pandas dataframe
If the FID column can be missing from the input data:
>>> from ._config import Config >>> from xopen import xopen >>> score1 = Config.ROOT_DIR / "tests" / "legacy" / "data" / "cineca_22_additive_0.sscore.zst" >>> with xopen(score1) as f: ... f.readline().split() ['#IID', 'ALLELE_CT', 'DENOM', 'NAMED_ALLELE_DOSAGE_SUM', 'PGS001229_22_AVG', 'PGS001229_22_SUM']
Then FID is set to IID:
>>> PolygenicScore(sampleset="test", path=score1).read() DENOM PGS001229_22_SUM sampleset FID IID test HG00096 HG00096 1564 0.545020 ...
- write(outdir, split=False)¶
Write PGS to a compressed TSV
- property df¶
- property path¶
- calc.lib.legacy.polygenicscore.logger¶