Association¶
At a glance¶
The association module turns variant-level GWAS signals into gene-level associations, and then (optionally) extends the analysis to:
- tissues / cell types (gene specificity–driven inference)
- drugs (drug perturbation–driven inference)
- spatiality (spatial transcriptomics–driven inference)
Typical command skeleton¶
java -jar kggsum.jar assoc \
--ref-gty-file <reference genotype> refG=<hg19|hg38> \
--sum-file <GWAS summary> cp12Cols=<...> pbsCols=<...> refG=<hg19|hg38> \
--output <output_prefix> \
[options]
What to read next¶
Start with the goal you have, then jump into the corresponding option section in options.md:
Inputs you typically provide¶
For all assoc runs, you must provide:
--sum-file(GWAS summary statistics) and the column mapping (cp12Cols=...,pbsCols=...)--ref-gty-file(reference genotypes for LD)--output(output prefix)
In addition, your goal determines which extra inputs you need:
- Genes / Heritability (GATES / ECS / EHE): choose either
--gene-model-database(gene definitions) or--xqtl-file(xQTL-style mapping) - Tissues / CellTypes (DESE), Drugs (pDESE), Spatiality (sDESE), Temporal (tDESE): provide
--gene-score-file(tissue/cell/drug/spatial-temporal profiles)
Outputs you typically scan¶
GeneBasedAssociationTask/genes.hg38.assoc.txt: gene-level p-values (and conditional gene-based results when applicable)GeneBasedConditionalAssociationTask/$scoreFileName.enrichment.txt: the main tissue/cell/drug/spatial/age ranking table when using--gene-score-file
Quick interpretation mindset¶
Use the highest-level scan order:
- Start from gene-level results (
genes.hg38.assoc.txtoreH2when enabled) - Then switch to enrichment rankings (
Adjusted(p)/Median(IQR)SigVsAll) for the final shortlist
About¶
The association module in KGGSum offers various functions to link genes, cell types, gene networks, and even drugs to phenotypes using GWAS signals from variants. A common step across all analyses involves aggregating association signals from multiple variants to derive gene-level associations using GATES (Li et al., 2011) and ECS (Li et al., 2019). Advanced association analyses build upon these gene-level associations. The rationale behind each type of association analysis is detailed in the respective option descriptions and relevant publications.
Main Workflow of the Association Module¶
-
Generation: Extract variant coordinates and frequencies from the VCF or GTB file to create a root variant set for further analysis.
-
Annotation: Annotate the root variant with gene features or xQTLs.
- Append: Integrate GWAS variants and their summary statistics into the annotated root variant set.
- Gene-based association: Conduct gene-based association analyses based on the gene feature annotations.
- Advanced association: Conduct gene-based association analyses based on the gene feature annotations.
- …: (Additional association analysis as specified).

Basic Usage¶
By default, the program performs gene-based association tests using GATES and ECS. Additional association analyses in this module can be initiated by specifying relevant input options (e.g., --gene-score-file).