Skip to content

Association

At a glance

The association module turns variant-level GWAS signals into gene-level associations, and then (optionally) extends the analysis to:

  • tissues / cell types (gene specificity–driven inference)
  • drugs (drug perturbation–driven inference)
  • spatiality (spatial transcriptomics–driven inference)

Typical command skeleton

java -jar kggsum.jar assoc \
  --ref-gty-file <reference genotype> refG=<hg19|hg38> \
  --sum-file <GWAS summary> cp12Cols=<...> pbsCols=<...> refG=<hg19|hg38> \
  --output <output_prefix> \
  [options]

Start with the goal you have, then jump into the corresponding option section in options.md:

Inputs you typically provide

For all assoc runs, you must provide:

  • --sum-file (GWAS summary statistics) and the column mapping (cp12Cols=..., pbsCols=...)
  • --ref-gty-file (reference genotypes for LD)
  • --output (output prefix)

In addition, your goal determines which extra inputs you need:

  • Genes / Heritability (GATES / ECS / EHE): choose either --gene-model-database (gene definitions) or --xqtl-file (xQTL-style mapping)
  • Tissues / CellTypes (DESE), Drugs (pDESE), Spatiality (sDESE), Temporal (tDESE): provide --gene-score-file (tissue/cell/drug/spatial-temporal profiles)

Outputs you typically scan

  • GeneBasedAssociationTask/genes.hg38.assoc.txt: gene-level p-values (and conditional gene-based results when applicable)
  • GeneBasedConditionalAssociationTask/$scoreFileName.enrichment.txt: the main tissue/cell/drug/spatial/age ranking table when using --gene-score-file

Quick interpretation mindset

Use the highest-level scan order:

  1. Start from gene-level results (genes.hg38.assoc.txt or eH2 when enabled)
  2. Then switch to enrichment rankings (Adjusted(p) / Median(IQR)SigVsAll) for the final shortlist

About

The association module in KGGSum offers various functions to link genes, cell types, gene networks, and even drugs to phenotypes using GWAS signals from variants. A common step across all analyses involves aggregating association signals from multiple variants to derive gene-level associations using GATES (Li et al., 2011) and ECS (Li et al., 2019). Advanced association analyses build upon these gene-level associations. The rationale behind each type of association analysis is detailed in the respective option descriptions and relevant publications.

Main Workflow of the Association Module

  1. Generation: Extract variant coordinates and frequencies from the VCF or GTB file to create a root variant set for further analysis.

  2. Annotation: Annotate the root variant with gene features or xQTLs.

  3. Append: Integrate GWAS variants and their summary statistics into the annotated root variant set.
  4. Gene-based association: Conduct gene-based association analyses based on the gene feature annotations.
  5. Advanced association: Conduct gene-based association analyses based on the gene feature annotations.
  6. : (Additional association analysis as specified).

assoc
assoc

Basic Usage

java -jar kggsum.jar assoc --ref-gty-file <input1> --sum-file <input2> --output <output> [options]

By default, the program performs gene-based association tests using GATES and ECS. Additional association analyses in this module can be initiated by specifying relevant input options (e.g., --gene-score-file).