Skip to content

Causation inference

At a glance

The causation module provides Mendelian randomization (MR) methods to infer causal effects from:

  • genes (e.g., eQTL-mediated MR)
  • lifestyle/exposures
  • microbes

to phenotypes (outcomes), including robust screening of multiple exposure/outcome pairs.

Typical command skeleton

java -jar kggsum.jar causal \
  --ref-gty-file <reference genotype> refG=<hg19|hg38> \
  --sum-file <GWAS summary> cp12Cols=<...> pbsCols=<...> refG=<hg19|hg38> \
  --output <output_prefix> \
  [options]

Pick a causation strategy in the module options:

  • EMIC (--emic-mr) for eQTL-mediated causal gene inference
  • PCMR (--variant-based-mr) for phenotype-to-phenotype MR with pleiotropy-aware clustering

Inputs you typically provide

For all causal runs, you must provide:

  • --ref-gty-file: reference genotypes used to compute LD
  • --sum-file: GWAS summary statistics (with correct cp12Cols=... and pbsCols=... mapping)
  • --output: output prefix/folder

In addition, your chosen strategy determines extra inputs:

  • EMIC (causal genes):
  • --emic-mr to enable EMIC
  • --xqtl-file to link SNPs to gene expression (xQTL-style effects)
  • PCMR (phenotype-to-phenotype MR):
  • --variant-based-mr to enable PCMR mode (set causalPair, IV/LD thresholds, etc.)
  • --sum-file entries for at least exposure and outcome (direction specified by causalPair)
  • Optional: --exclude-complementary-allele

Outputs you typically scan

  • EMIC: GeneBasedCausationTask/genes.hg38.emic.tsv
  • PCMR: MendelianRandomization.summary.tsv (main summary)
  • PCMR: IVScatterPlots.pdf (visual effect scatter)
  • PCMR: PCMRTask/variants.hg38.tsv.gz (per-IV statistics)

Quick interpretation mindset

Scan order is strategy-dependent:

  • EMIC:
  • Prioritize genes with strong statistical support (::P)
  • Interpret direction from the sign of ::Effect
  • PCMR:
  • Check correlated horizontal pleiotropy first (see TwoCategoryPCMR_P_Plei in MendelianRandomization.summary.tsv)
  • Then read the preferred model p-value (OneCategoryPCMR_P or TwoCategoryPCMR_P_CausalEval)
  • Finally confirm effect direction/size consistency with conventional MR methods (IVW/Egger/Median/MBE/RIVW/JCWC columns)

About

The causation module provides advanced Mendelian randomization methods for inferring causation from genes, lifestyle, or microbes (as exposures) to phenotypes (as outcomes). It enables rapid inference screening of tens of exposures and outcomes simultaneously.

Main Workflow of the Causation Module

  1. Generation: Extract variant coordinates and frequencies from the VCF or GBC file to create a root variant set for further analysis.
  2. Annotation: Annotate the root variant with gene features or xQTLs.
  3. Append: Integrate GWAS variants and their summary statistics into the annotated root variant set.
  4. LD Clumping: Select significant variants of exposures as IVs, and remove redundant IVs according to LD.
  5. Gene Sticking: Link IVs to potential target genes according to LD. This step is intentionally rough, so some targets may not be truly causal.
  6. MR analysis: Infer causality using the chosen MR method (e.g., EMIC or PCMR).
  7. : (Additional analysis as specified).

causal
causal

Basic Usage

java -jar kggsum.jar causal --sum-file <input1> --ref-gty-file <input2> --output <output> [options]