Causation inference¶
At a glance¶
The causation module provides Mendelian randomization (MR) methods to infer causal effects from:
- genes (e.g., eQTL-mediated MR)
- lifestyle/exposures
- microbes
to phenotypes (outcomes), including robust screening of multiple exposure/outcome pairs.
Typical command skeleton¶
java -jar kggsum.jar causal \
--ref-gty-file <reference genotype> refG=<hg19|hg38> \
--sum-file <GWAS summary> cp12Cols=<...> pbsCols=<...> refG=<hg19|hg38> \
--output <output_prefix> \
[options]
What to read next¶
Pick a causation strategy in the module options:
- EMIC (
--emic-mr) for eQTL-mediated causal gene inference - PCMR (
--variant-based-mr) for phenotype-to-phenotype MR with pleiotropy-aware clustering
Inputs you typically provide¶
For all causal runs, you must provide:
--ref-gty-file: reference genotypes used to compute LD--sum-file: GWAS summary statistics (with correctcp12Cols=...andpbsCols=...mapping)--output: output prefix/folder
In addition, your chosen strategy determines extra inputs:
- EMIC (causal genes):
--emic-mrto enable EMIC--xqtl-fileto link SNPs to gene expression (xQTL-style effects)- PCMR (phenotype-to-phenotype MR):
--variant-based-mrto enable PCMR mode (setcausalPair, IV/LD thresholds, etc.)--sum-fileentries for at least exposure and outcome (direction specified bycausalPair)- Optional:
--exclude-complementary-allele
Outputs you typically scan¶
- EMIC:
GeneBasedCausationTask/genes.hg38.emic.tsv - PCMR:
MendelianRandomization.summary.tsv(main summary) - PCMR:
IVScatterPlots.pdf(visual effect scatter) - PCMR:
PCMRTask/variants.hg38.tsv.gz(per-IV statistics)
Quick interpretation mindset¶
Scan order is strategy-dependent:
- EMIC:
- Prioritize genes with strong statistical support (
::P) - Interpret direction from the sign of
::Effect - PCMR:
- Check correlated horizontal pleiotropy first (see
TwoCategoryPCMR_P_PleiinMendelianRandomization.summary.tsv) - Then read the preferred model p-value (
OneCategoryPCMR_PorTwoCategoryPCMR_P_CausalEval) - Finally confirm effect direction/size consistency with conventional MR methods (IVW/Egger/Median/MBE/RIVW/JCWC columns)
About¶
The causation module provides advanced Mendelian randomization methods for inferring causation from genes, lifestyle, or microbes (as exposures) to phenotypes (as outcomes). It enables rapid inference screening of tens of exposures and outcomes simultaneously.
Main Workflow of the Causation Module¶
- Generation: Extract variant coordinates and frequencies from the VCF or GBC file to create a root variant set for further analysis.
- Annotation: Annotate the root variant with gene features or xQTLs.
- Append: Integrate GWAS variants and their summary statistics into the annotated root variant set.
- LD Clumping: Select significant variants of exposures as IVs, and remove redundant IVs according to LD.
- Gene Sticking: Link IVs to potential target genes according to LD. This step is intentionally rough, so some targets may not be truly causal.
- MR analysis: Infer causality using the chosen MR method (e.g., EMIC or PCMR).
- …: (Additional analysis as specified).
