Genes

EMIC

EMIC (Effective-median-based Mendelian randomization framework for Inferring the Causal genes of complex phenotypes) inferences gene expressions’ causal effect on a complex phenotype with dependent expression quantitative loci by a robust median-based Mendelian randomization. The effective-median method solved the high false-positive issue in the existing MR methods due to either correlation among instrumental variables or noises in approximated linkage disequilibrium (LD). EMIC can further perform a pleiotropy fine-mapping analysis to remove possible false-positive estimates (Jiang et al. 2022).

Citations

[1] Lin Jiang, Lin Miao, Guorong Yi, Xiangyi Li, Chao Xue, Mulin Jun Li, Hailiang Huang and Miaoxin Li. Powerful and robust inference of causal genes of complex phenotypes with dependent expression quantitative loci by a novel median-based Mendelian randomization. Am J Hum Genet. 2022 May 5;109(5):838-856. PubMed

Options

The tutorial command is:

java -Xmx4g -jar ../kggsum.jar \
   causal \
   --xqtl-file https://idc.biosino.org/pmglab/resource/kgg/kggsum/datasets/gtex/eqtl/hg38/Brain_Frontal_Cortex_BA9_eur_v8_tmm_p01.gene.hg38.cov.eqtl.tsv.gz \
              refG=hg38 \
   --sum-file ./scz_gwas_eur_chr1.tsv.gz \
              cp12Cols=CHR,BP,A1,A2 \
              pbsCols=P,OR,SE \
              betaType=2 \
              prevalence=0.01 \
   --ref-gty-file ./1kg_hg19_eur_chr1.vcf.gz \
              refG=hg19 \
   --threads 18 \
   --output ./test/ba9_scz_causal
Flag Description Default
causal Trigger the causality inference. The default analysis is EMIC to infer causal genes of phenotypes. -
--xqtl-file Specifies a file containing SNP effects on gene or transcript expression. The file should be a text table, where each row represents a single SNP, and columns are delimited by tabs or spaces. This is a combination parameter; further details can be found in the description of --xqtl-file. -
... ... ...

Output

The numeric results of EMIC are saved in GeneBasedCausationTask/genes.hg38.emic.tsv. There are nine columns in the file:

Header Description
SymbolID The gene symbol
ChromosomeID Chromosome of a gene
Start The coordinate of the first SNP.
End The coordinate of the last SNP.
ExpressionID The gene ID.
::IVNum Number of IVs within the gene.
::Effect The estimated causality effect.
::SE The standard error of the estimated causality effect.
::P p-value of EMIC for statistical causality test.

Phenotypes

PCMR

PCMR (Pleiotropic Clustering model for MR analysis) is a tool for analyzing GWAS summary statistics (provided via --sum-file) to infer causal relationships between phenotypes. It is designed to tackle correlated horizontal pleiotropy, a common challenge in Mendelian Randomization (MR) studies.

By extending the zero modal pleiotropy assumption (ZEMPA), PCMR improves causal inference even in the presence of a high proportion of pleiotropic variants. It tackles the difficulty of distinguishing between correlated pleiotropic effects and true causal effects by combining them into a single “correlated HVP effect,” modeled using a Gaussian Mixture Model. This allows PCMR to categorize instrumental variables (IVs) effectively, including identifying those with causal effects.

PCMR also includes a pleiotropy test to detect correlated horizontal pleiotropy and enhances causal inference in these scenarios. This makes it a powerful tool for evaluating the causal effects of gene expression on complex phenotypes. (Tang et al., 2024).

Citations

[1] Bin Tang, Nan Lin, Junhao Liang, Guorong Yi, Liubin Zhang, Wenjie Peng, Chao Xue, Hui Jiang, Miaoxin Li. Leveraging Pleiotropic Clustering to Address High Proportion Correlated Horizontal Pleiotropy in Mendelian Randomization Studies. Nat Commun. 2025 Mar 21;16(1):2817 PubMed

Options

This main analysis inputs GWAS summary of SNPs and outputs p-values of genes. The following are options for an example:

java -Xmx4g -jar ../kggsum.jar \
  causal \
   --pcmr 1T2,2T1 \
   --sum-file ./smoking_chr1.tsv.gz \
              cp12Cols=CHR,POS,A1,A2 \
              pbsCols=Pval,Beta,SE \
              prevalence=0.05   \
              betaType=1   \
   --sum-file ./scz_gwas_eur_chr1.tsv.gz \
              cp12Cols=CHR,BP,A1,A2 \
              pbsCols=P,OR,SE \
              betaType=2 \
              prevalence=0.01 \
   --ref-gty-file ./1kg_hg19_eur_chr1.vcf.gz \
   --threads 10 \
   --output ./test/smk_scz_pcmr \
   --exclude-complementary-allele
Format Description Default
--pcmr Triggers the PCMR analysis. This is a combination parameter with the following options:
causalPair: Defines the direction of causal inference, where traits (indicated by their order number) are specified in the --sum-file. For example, a value of 1T2 indicates an inference of causation from the phenotype(s) specified by the first --sum-file to the phenotype(s) listed in the second--sum-file.
effIVPCut: Sets the p-value threshold for selecting instrumental variables.
effIVPCorrect: Set a method for multiple testing of p-values for selecting instrumental variables. There are three candidates methods: fixed (no correction), bonf (Bonferroni correction), and bhdfr (Benjamini and Hochberg FDR).
ldPruneCut: Sets the r² threshold for LD clumping.
initIVPCut: Sets the p-value threshold for selecting instrumental variables to model uncorrected pleiotropic effects.
ldStickCut: Sets the LD r² threshold for clustering genes whose SNPs are in LD with an instrumental variable.
Format:
--pcmr causalPair= effIVPCut=[p-value] effIVPCorrect=[fixed] ldPruneCut=[r²] initIVPCut=[p-value] ldStickCut= [r²]

Example:
--pcmr 1T2,2T1 5E-8 fixed 0.1 0.5 0.8
causalPair=1T2,2T1
effIVPCut=5E-8
effIVPCorrect=fixed
ldPruneCut=0.1
initIVPCut=0.5
ldStickCut=0.8
--exclude-complementary-allele If specified, variants with complementary alleles (e.g., A/T and C/G) are excluded from the analysis. -
The description of other options is the same as that for association analyses.

Output

At the end of the PCMR analysis, the results are summarized and stored in a file named MendelianRandomization.summary.tsv. Meanwhile, the main causal inference results are detailed on the screen. Here is a case example:

2024-11-19 21:15:04 Clustering (2 categories) phi: [0.29485845139124117,0.3508103656075827] 
2024-11-19 21:15:04 Heterogeneity test by P_plei-test for correlated horizontal pleiotropy: 0.86885
2024-11-19 21:16:17 Correlated horizontal pleiotropy may be absent (P_plei-tes >= 0.20), and the estimate causal effect is: 

- By the one-category model of PCMR: 
  - The causal effect(SE): 0.323(0.0523); OR: 1.38(1.25-1.53) 
   - PCMR's causality evaluation p-value: 6.56e-10

- By Inverse-Variance Weighted MedianMR: 
  - The causal effect(SE): 0.321(0.0634); OR: 1.38(1.22-1.56) 
  - Median-based causality evaluation p-value: 4.03e-07

According to PCMR's heterogeneity test, it indicates the failure to reject the null hypothesis of no correlated pleiotropy ($P_{plei-tes} >= 0.20$). The causality may be more appropriately inferred by the one-category model of PCMR and conventional inverse-variance weighted median MR.

Below is an explanation of the output columns of MendelianRandomization.summary.tsv, specifically from the PCMR (Pleiotropic Clustering model for Mendelian Randomization) analysis, as well as other conventional Mendelian Randomization (MR) methods provided for comparison.


General Columns

These columns provide the basic setup of the analysis:

  • Exposure: The variable tested as the potential cause. This could be a genetic variant, a gene expression profile, a lifestyle factor, or a microbial abundance, depending on the input data specified via the --sum-file in KGGSum. For example, it might represent smoking behavior in a phenotype-to-phenotype analysis or a gene’s expression level in a gene-to-phenotype analysis.
  • Outcome: The variable tested as the effect, typically a phenotype or disease of interest (e.g., schizophrenia). This is specified in the second --sum-file when using the --pcmr option with a causal direction like 1T2 (first file to second file).
  • IVNum: The number of Instrumental Variables (IVs) used in the analysis. IVs are genetic variants significantly associated with the exposure (based on a p-value threshold) but assumed to affect the outcome only through the exposure. This is determined by the effIVPCut parameter in the --pcmr option (default: 5E-8).
  • PCut: The p-value cutoff used to select IVs from the exposure GWAS summary statistics. This threshold filters variants with a significant association with the exposure (e.g., effIVPCut=5E-8 in the PCMR options).
  • FDRCut: The False Discovery Rate cutoff, an alternative method to control for multiple testing when selecting IVs. This may be applied if effIVPCorrect=bhfdr is specified in the --pcmr option, adjusting the p-value threshold using the Benjamini-Hochberg procedure.

Two-Category PCMR Model Results

The Two-Category PCMR model divides IVs into two groups to account for correlated horizontal pleiotropy (where IVs affect the outcome through pathways other than the exposure). These columns report the results:

  • TwoCategoryPCMR_CorrelatedHorizontalPleiotropy: A binary indicator (e.g., Absent/Present) showing whether correlated horizontal pleiotropy is detected. This is based on a statistical test (see P_Plei below). If "Present," the two-category model is preferred for causal inference.
  • TwoCategoryPCMR_P_Plei: The p-value from the pleiotropy test. A small p-value (e.g., < 0.2) suggests significant correlated horizontal pleiotropy, meaning some IVs influence the outcome independently of the exposure.
  • TwoCategoryPCMR_P_CausalEval: The p-value for assessing the causal effect in the two-category PCMR model. A small p-value (e.g., < 0.05) suggests that the group of instrumental variables (IVs) deriving the dominant causal effect can be confidently identified, supporting a significant causal link between the exposure and the outcome. Conversely, an insignificant p-value does not necessarily imply the absence of a causal relationship; rather, it may indicate that the two IV groups—one tied to the true causal effect and the other to pleiotropic effects—are too similar in size to determine which reflects the causal pathway. For further details, see the PCMR paper.
  • TwoCategoryPCMR_C1_Effect(SE): The estimated causal effect (beta coefficient) and its standard error (SE) for the first category of IVs. This represents the effect size of the exposure on the outcome for this group.
  • TwoCategoryPCMR_C1_OR(95CI): The odds ratio (OR) and its 95% confidence interval (CI) for the first category. For binary outcomes, this indicates the change in odds of the outcome per unit change in the exposure (e.g., 1.38 means a 38% increase).
  • TwoCategoryPCMR_C2_Effect(SE): The estimated causal effect and standard error for the second category of IVs.
  • TwoCategoryPCMR_C2_OR(95CI): The odds ratio and 95% CI for the second category.

One-Category PCMR Model Results

If no significant pleiotropy is detected (i.e., P_Plei ≥ 0.2), the One-Category PCMR model is used, assuming all IVs reflect a single causal pathway:

  • OneCategoryPCMR_P: The p-value for the causal effect in the one-category model. A small value suggests a significant causal relationship.
  • OneCategoryPCMR_Beta(SE): The estimated causal effect (beta) and its standard error.
  • OneCategoryPCMR_OR(95CI): The odds ratio and 95% CI for the causal effect.

Conventional MR Methods

KGGSum also provides results from standard MR methods for comparison, each with its own assumptions and robustness to pleiotropy:

Inverse-Variance Weighted (IVW) MR

  • IVW_MR_P: P-value for the causal effect using the IVW method, which combines IV effects weighted by their precision.
  • IVW_MR_Beta(SE): Estimated causal effect and standard error.
  • IVW_MR_OR(95CI): Odds ratio and 95% CI.

Egger MR

  • EGGER_MR_P: P-value for the causal effect, robust to pleiotropy but less powerful than IVW.
  • EGGER_MR_Beta(SE): Estimated causal effect and standard error.
  • EGGER_MR_OR(95CI): Odds ratio and 95% CI.
  • EGGER_MR_Intercept(SE): The intercept from Egger regression, indicating pleiotropy if significantly different from zero.
  • EGGER_MR_InterceptP: P-value for the intercept; a small value suggests pleiotropy.

Median MR

  • Median_MR_P: P-value using the median-based method, robust to outliers.
  • Median_MR_Beta(SE): Estimated causal effect and standard error.
  • Median_MR_OR(95CI): Odds ratio and 95% CI.

Mode-Based Estimate (MBE) MR

  • MBE_MR_P: P-value for the mode-based method, focusing on the most common effect size.
  • MBE_MR_Beta(SE): Estimated causal effect and standard error.
  • MBE_MR_OR(95CI): Odds ratio and 95% CI.

Robust IVW (RIVW) MR

  • RIVW_MR_P: P-value for a rerandomized robust IVW method, MR.Rerand, adjusting for the winner’s curse.
  • RIVW_MR_Beta(SE): Estimated causal effect and standard error.
  • RIVW_MR_OR(95CI): Odds ratio and 95% CI.

JCWC MR

  • JCWC_MR_P: P-value for a new and unpublished MR method (adjust for winner’s curse and false positive IVs).
  • JCWC_MR_Beta(SE): Estimated causal effect and standard error.
  • JCWC_MR_OR(95CI): Odds ratio and 95% CI.

How to Interpret the Output

  1. Check for Pleiotropy: Start with TwoCategoryPCMR_P_Plei. If it’s ≥ 0.2, pleiotropy is not significant, and the OneCategoryPCMR results are more appropriate. If < 0.2, use the TwoCategoryPCMR results.
  2. Causal Effect Significance: Look at the p-values (P_CausalEval for two-category, OneCategoryPCMR_P for one-category, or conventional MR p-values). A value < 0.05 suggests a significant causal effect.
  3. Effect Size and Direction: The Beta (effect size) and OR (odds ratio) indicate the strength and direction of the causal relationship. Positive beta/OR > 1 means the exposure increases the outcome; negative beta/OR < 1 means that it decreases it.
  4. Consistency Across Methods: Compare PCMR results with conventional MR methods (IVW, Egger, etc.). Consistent estimates across methods strengthen confidence in the causal inference.
  5. Example Interpretation: If TwoCategoryPCMR_P_Plei = 0.20 (no pleiotropy), OneCategoryPCMR_P = 6.56e-10, and OneCategoryPCMR_OR = 1.38 (1.25-1.53), it suggests a significant causal effect where a unit increase in the exposure raises the odds of the outcome by 38%, with no evidence of pleiotropy complicating the analysis.

In addition, the summary statistics of IVs for MR are saved in the file named variants.hg38.tsv.gz under the subdirectory of PCMRTask. There are thirteen columns in the file:

Header Description
CHROM Chromosome of the gene
POS The coordinate of the IV with the lowest GWAS p-value
REF The reference sequence base
ALT The alternative sequence base
MarkFeatureGene The Gene annotated with the SNP
MarkGeneFeature The feature of the gene annotated with the SNP
[exposure]::P The P value of this SNP on exposure
[exposure]::Beta The effect of this SNP on exposure
[exposure]::SE The effect's standard error of this SNP on exposure
[outcome]::P The P value of this SNP on the outcome
[outcome]::Beta The effect of this SNP on the outcome
[outcome]::SE The effect's standard error of this SNP on the outcome
Class The category given by PCMR, may be 1, 2, 3, etc.

A graphical result file is also presented as IVScatterPlots.pdf in the same directory. The horizontal axis of the graph represents the effect of SNPs on the exposure variable, while the vertical axis represents the effect of SNPs on the outcome variable. Each point signifies an SNP selected by PCMR, along with the confidence interval of its corresponding effect size. Different colors are used to distinguish between different types of points identified by PCMR. The slope of the diagonal line with the same color represents the effect of the exposure on the outcome by the corresponding category of SNPs.

There are also some intermediate output files in the directory Actinobacteria_AN_pcmr. A brief introduction is provided in the following table.

File Description
ConvertVCF2GTBTask\EUR.hg19.gtb the gtb format file of input VCF file
GenerateRootVariantSetTask\variants.annot.hg38.gtb An hg38 file in gtb format with genotype data removed and annotation information added
IncorporateVariants2RootVariantSetTask\{exposure_file}.gtb the gtb format file of the exposure sum file. By default, all coordinates are converted to hg38.
IncorporateVariants2RootVariantSetTask\{outcome_file}.gtb the gtb format file of the outcome sum file. By default, all coordinates are converted to hg38.
IncorporateVariants2RootVariantSetTask\variants.annot.hg38.gtb The gtb format file of the SNPS selected by P-value in the exposure and outcome files
GeneFeatureAnnotationTask\variants.annot.hg38.gtb variants.annot.hg38.gtb in IncorporateVariants2RootVariantSetTask with annotation
LDGeneStickingTask\variants.annot.hg38.gtb The loci annotated to the nearest gene region
LDPruningTask\variants.annot.hg38.5.0E-8.gtb variants.annot.hg38.0.5.gtb The files containing the retained SNPs after LD clumping.

Microbes

Infer causality from microbes to phenotypes by MR methods

We provide GWAS summary statistics of microbes to enable users to identify causal microbes associated with phenotypes using Mendelian Randomization (MR) methods. The advanced MR method, PCMR, which is more robust to correlated horizontal pleiotropy (see details above), is applied as the primary analysis. Simultaneously, the presence of correlated horizontal pleiotropy is assessed. If no significant correlated horizontal pleiotropy is detected, a conventional IVW-based MR method is subsequently performed to evaluate the significance of the estimated causal effects.

Citations

[1] Bin Tang, Nan Lin, Junhao Liang, Guorong Yi, Liubin Zhang, Wenjie Peng, Chao Xue, Hui Jiang, Miaoxin Li. Leveraging Pleiotropic Clustering to Address High Proportion Correlated Horizontal Pleiotropy in Mendelian Randomization Studies. Nat Commun. 2025 Mar 21;16(1):2817 PubMed

Options

This main analysis inputs a GWAS summary of SNPs and outputs p-values of genes. The following are options for an example:

java -Xmx4g -jar ../kggsum.jar \
  causal \
   --pcmr 1T2 \
          effIVPCut=1E-3 \
   --sum-file './microbiome/mibiogen/k__*.tsv.gz' \
              cp12Cols=CHR,BP,A1,A2 \
              pbsCols=P,BETA,SE \
              sep=TAB \
              refG=hg19 \
              betaType=0 \
   --sum-file ./scz_gwas_eur_chr1.tsv.gz \
              cp12Cols=CHR,BP,A1,A2 \
              pbsCols=P,OR,SE \
              betaType=2 \
              prevalence=0.01 \
   --ref-gty-file ./1kg_hg19_eur_chr1.vcf.gz \
   --threads 10 \
   --output ./test/microb_scz_casual \
   --exclude-complementary-allele
Format Description Default
--pcmr Triggers the PCMR analysis. This is a parameter set with multiple sub-options, as described above.

Example:
--pcmr 1T2,2T1 5E-5 fixed 0.1 0.5 0.8
causalPair=1T2,2T1
effIVPCut=5E-8
effIVPCorrect=fixed
ldPruneCut=0.1
initIVPCut=0.5
ldStickCut=0.8
.. ... -

Output

The output results of analyses are also the same as that of the phenotype causality. The causality inference results are summarized and stored in MendelianRandomization.summary.tsv file.

Copyright ©MiaoXin Li all right reservedLast modified time: 2025-04-05 03:51:21

results matching ""

    No results matching ""