calc.lib.legacy.polygenicscore ============================== .. py:module:: calc.lib.legacy.polygenicscore Attributes ---------- .. autoapisummary:: calc.lib.legacy.polygenicscore.logger Classes ------- .. autoapisummary:: calc.lib.legacy.polygenicscore.AdjustArguments calc.lib.legacy.polygenicscore.AdjustResults calc.lib.legacy.polygenicscore.AggregatedPGS calc.lib.legacy.polygenicscore.PolygenicScore Module Contents --------------- .. py:class:: AdjustArguments Arguments that control genetic similarity estimation and PGS adjustment >>> AdjustArguments(method_compare="Mahalanobis", pThreshold=None, method_normalization=("empirical", "mean")) AdjustArguments(method_compare='Mahalanobis', pThreshold=None, method_normalization=('empirical', 'mean')) .. py:attribute:: method_compare :type: str :value: 'RandomForest' .. py:attribute:: method_normalization :type: tuple[str, Ellipsis] :value: ('empirical', 'mean', 'mean+var') .. py:attribute:: pThreshold :type: float | None :value: None .. py:class:: AdjustResults Results returned by :class:`AggregatedPGS.adjust()` .. py:method:: write(directory) Write model, PGS, and PCA data to a directory .. py:attribute:: model_meta :type: dict .. py:attribute:: models :type: pandas.DataFrame .. py:attribute:: pca :type: pandas.DataFrame .. py:attribute:: pgs :type: pandas.DataFrame .. py:attribute:: scorecols :type: list[str] .. py:attribute:: target_label :type: str .. py:class:: AggregatedPGS(*, target_name, df=None, path=None) A PGS that's been aggregated, melted, and probably contains samples from a reference panel and a target population. The most useful method in this class adjusts PGS based on :func:`genetic ancestry similarity estimation `. >>> from ._config import Config >>> score_path = Config.ROOT_DIR / "tests" / "legacy" / "data" / "aggregated_scores.txt.gz" >>> AggregatedPGS(path=score_path, target_name="hgdp") AggregatedPGS(path=PosixPath('.../aggregated_scores.txt.gz')) .. py:method:: adjust(*, ref_pc, target_pc, adjust_arguments=None) Adjust a PGS based on genetic ancestry similarity estimations. :returns: :class:`AdjustResults` >>> from ._config import Config >>> from .principalcomponents import PrincipalComponents >>> related_path = Config.ROOT_DIR / "tests" / "legacy" / "data" / "ref.king.cutoff.id" >>> ref_pc = PrincipalComponents(pcs_path=[Config.ROOT_DIR / "tests" / "legacy" /"data" / "ref.pcs"], dataset="reference", psam_path=Config.ROOT_DIR / "tests" / "legacy" /"data" / "ref.psam", pop_type=PopulationType.REFERENCE, related_path=related_path) >>> target_pcs = PrincipalComponents(pcs_path=Config.ROOT_DIR / "tests" / "legacy" / "data" / "target.pcs", dataset="target", pop_type=PopulationType.TARGET) >>> score_path = Config.ROOT_DIR / "tests" / "legacy" / "data" / "aggregated_scores.txt" >>> results = AggregatedPGS(path=score_path, target_name="hgdp").adjust(ref_pc=ref_pc, target_pc=target_pcs) >>> results.pgs.to_dict().keys() dict_keys(['SUM|PGS001229_hmPOS_GRCh38', 'percentile_MostSimilarPop|PGS001229_hmPOS_GRCh38', 'Z_MostSimilarPop|PGS001229_hmPOS_GRCh38', ... >>> results.models {'dist_empirical': {'PGS001229_hmPOS_GRCh38': {'EUR': {'percentiles': array([-1.04069000e+01, -7.94665080e+00, ... Write the adjusted results to a directory: >>> import tempfile, os >>> dout = tempfile.mkdtemp() >>> results.write(directory=dout) >>> sorted(os.listdir(dout)) ['target_info.json.gz', 'target_pgs.txt.gz', 'target_popsimilarity.txt.gz'] .. py:property:: df .. py:property:: path .. py:property:: target_name .. py:class:: PolygenicScore(*, path=None, df=None, sampleset=None) Represents the output of ``plink2 --score`` written to a file >>> from ._config import Config >>> import reprlib >>> score1 = Config.ROOT_DIR / "tests" / "legacy" / "data" / "cineca_22_additive_0.sscore.zst" >>> pgs1 = PolygenicScore(sampleset="test", path=score1) # doctest: +ELLIPSIS >>> pgs1 PolygenicScore(sampleset='test', path=PosixPath('.../cineca_22_additive_0.sscore.zst')) >>> pgs2 = PolygenicScore(sampleset="test", path=score1) >>> reprlib.repr(pgs1.read().to_dict()) # doctest: +ELLIPSIS "{'DENOM': {('test', 'HG00096', 'HG00096'): 1564, ... 'PGS001229_22_SUM': {('test', 'HG00096', 'HG00096'): 0.54502, ... It's often helpful to combine PGS that were split per chromosome or by effect type: >>> aggregated_score = pgs1 + pgs2 >>> aggregated_score # doctest: +ELLIPSIS PolygenicScore(sampleset='test', path='(in-memory)') Once a score has been fully aggregated it can be helpful to recalculate an average: >>> aggregated_score.average() >>> aggregated_score.df # doctest: +ELLIPSIS,+NORMALIZE_WHITESPACE PGS SUM DENOM AVG sampleset FID IID test HG00096 HG00096 PGS001229_22 1.090040 3128 0.000348 HG00097 HG00097 PGS001229_22 1.348802 3128 0.000431 ... Scores can be written to a TSV file: >>> import tempfile, os >>> outd = tempfile.mkdtemp() >>> aggregated_score.write(str(outd)) >>> os.listdir(outd) ['aggregated_scores.txt.gz'] With support for splitting output files by sampleset: >>> splitoutd = tempfile.mkdtemp() >>> aggregated_score.write(splitoutd, split=True) >>> sorted(os.listdir(splitoutd), key = lambda x: x.split("_")[0]) ['test_pgs.txt.gz'] If a sampleset can't be inferred from argument or path, error: >>> PolygenicScore() Traceback (most recent call last): ... TypeError: Missing sampleset .. py:method:: average() Update the dataframe with a recalculated average. .. py:method:: melt() Update the dataframe with a melted version (wide format to long format) .. py:method:: read() Eagerly load a PGS into a pandas dataframe If the FID column can be missing from the input data: >>> from ._config import Config >>> from xopen import xopen >>> score1 = Config.ROOT_DIR / "tests" / "legacy" / "data" / "cineca_22_additive_0.sscore.zst" >>> with xopen(score1) as f: ... f.readline().split() ['#IID', 'ALLELE_CT', 'DENOM', 'NAMED_ALLELE_DOSAGE_SUM', 'PGS001229_22_AVG', 'PGS001229_22_SUM'] Then FID is set to IID: >>> PolygenicScore(sampleset="test", path=score1).read() # doctest: +ELLIPSIS,+NORMALIZE_WHITESPACE DENOM PGS001229_22_SUM sampleset FID IID test HG00096 HG00096 1564 0.545020 ... .. py:method:: write(outdir, split=False) Write PGS to a compressed TSV .. py:property:: df .. py:property:: path .. py:data:: logger