match.lib.scoringfileframe

Attributes

logger

Classes

ScoringFileFrame

Like pgscatalog.core.NormalisedScoringFile, but backed by the polars dataframe library

Functions

match_variants(score_df, target_df, target)

Get all match candidates for a VariantFrame dataframe and ScoringFileFrame dataframe

Module Contents

class match.lib.scoringfileframe.ScoringFileFrame(paths, chrom=None, cleanup=True, tmpdir=None)

Like pgscatalog.core.NormalisedScoringFile, but backed by the polars dataframe library

Instantiated with a pgscatalog.core.NormalisedScoringFile written to a file. This is a long format/melted CSV file containing normalised variant data (i.e. the output of combine scorefiles application):

>>> from ._config import Config
>>> path = Config.ROOT_DIR.parent / "pgscatalog.core" / "tests" / "data" / "combined.txt.gz"
>>> x = ScoringFileFrame(path)
>>> x
ScoringFileFrame([NormalisedScoringFile('.../combined.txt.gz')])

Using a context manager is important to prepare a polars dataframe:

>>> with x as arrow:
...     assert all(os.path.exists(x) for x in x.arrowpaths)
...     arrow.collect().shape
(154, 11)
>>> assert not any(os.path.exists(x) for x in x.arrowpaths)  # all cleaned up
>>> from .variantframe import VariantFrame
>>> path = Config.ROOT_DIR.parent / "pgscatalog.core" / "tests" / "data" / "hapnest.bim"
>>> target = VariantFrame(path, dataset="hapnest")
>>> with target as target_df, x as score_df:
...     match_variants(score_df=score_df, target_df=target_df, target=target)
MatchResult(dataset=hapnest, matchresult=[<LazyFrame ...
save_ipc(destination)

Save the dataframe prepared by the context manager to an Arrow IPC file

Useful because the context manager will clean up the IPC files while exiting.

This method allows data to be persisted.

arrowpaths = None
chrom = None
match.lib.scoringfileframe.match_variants(score_df, target_df, target)

Get all match candidates for a VariantFrame dataframe and ScoringFileFrame dataframe

Returns a MatchResult

match.lib.scoringfileframe.logger