2025-01-17 14:33:21 -0500 -0500
New work led by Seon-Kyeong Jang leveraging PLMs to improve rare variant association testing.
Leveraging protein language models to identify complex trait associations with previously inaccessible classes of functional rare variants
Protein language models (PLMs) provide variant effect predictions for previously underexplored classes of rare variants in exome sequencing studies. Here we present novel approaches for leveraging the unique properties of PLMs to test for associations between complex traits and rare variants. First, we develop an allelic series-based regression test for isoform-specific variants and discover ~22% more significant associations than standard tests. Furthermore, 17 gene-trait pairs showed significantly higher effect sizes in non-canonical than canonical transcript. Next, we search for Evolutionary Plausible Variants (EPVs), attributed positive scores by PLMs, which are at the opposite spectrum of annotated deleterious variants. We found EPVs compose a small percentage of missense variants (0.45%) and, consistent with differential selection pressures, their allele frequencies are significantly higher than non-EPV and synonymous (p<2.2e-16) variants. We additionally identify eight associations with EPVs, including novel protective associations with LDL and bone mineral density. Our results show how applying PLMs to exome data expands the universe of gene-trait association mapping and interpretation.