KGG:A systematic biological Knowledge-based mining system for Genome-wide Genetic studies
KGG 2.5 Demo Video
KGG Application (If you have any question about KGG, please email:
Type File Version
MS Windows / Mac OS X / Linux KGG4 4.1
User Manual User Manual.pdf 4.1
Sample Data -
Source Code 4.0

Phased genotypes of SNPs to account for linkage disequilibrium
Type File
1000 Genomes Projects Go

Hints for large GWAS dataset (around or over 2.5 million SNPs)
1. Maximize your Java heap size larger than 8GB to initiate KGG, by Tools->Set System Memory.
2. Only EXPORT a small set of genes or SNPs you are interested

KGG is free of charge. All materials on the website are provided without any warranty. Please use them at your own risk.

Update all download links in KGG.
Add an analysis package for estimating driver-tissue of complex phenotypes based on summary statistics.
Update publication information of gene-based and conditional gene-based association by ECS.
Fixed a bug in downloading gene symbols from HGNC database.
Fixed bugs in LOG messages.
1. Fixed a bug in the interface of building analysis genome.
2. Update the database of GEncode to be version 25.
Release KGG4.0, in which three powerful tests are added, a gene-based association test, a conditional gene-based association test, and a gene-set based association test.
Implement a multivariate association test ( Trait-based Association Test that uses Extended Simes procedure, TATES ) for SNPs inside genes when conducting gene-based association analysis.
Improve the set-based power estimation module.
1. Update user manual from KGG3.0 to KGG3.5
2. Allow users to select genes according to the gene groups by HGNC (
Provide a link to newly compiled 1000 Genomes Project phased genotypes datasets to account for linkage disequilibrium.
Refine some graphic interfaces to view gene- and pathway-based results.
Release KGG3.5 today. Compared to the KGG3.0, KGG3.5 has several new features.
1. 100 times faster than KGG3 or earlier version when building analysis genome with around 10 million SNPs.
2. Add a function to calculate power of set-based tests.
3. Exclude SNPs without LD for set-based test, which inflate the type one errors in previous version.
1. Fixed some minor bugs.
2. Added a new function to replace the old gene symbols with the latest ones according to HGNC database, (Thank Attila Pulay for reporting the problem)
1. Allow users to exclude SNPs without LD information for gene-based association test. 2. Improve the gene-pair based association test for large gene-pair sets.
Fixed bug in gene-based LD plotting with re-used LD data.
Fixed a minor bug for Benjamini & Hochberg (1995) FDR and add Benjamini & Yekutieli (2001) FDR test.
Update PPI-based modules to be Interaction-based modules which can read multiple gene-pair files at a time. A gene-pair can be defined according to protein interaction, co-expression or other biological evens.
1. Update PPI-based association module to Interaction-based association module.
2. Separate multivariate gene-based association from gene-based association.
Release KGG3.0 today. Compared to the KGG2.5, KGG3 has several new features.
1. A more user-friendly interface based on NetBeans modules;
2. A new function to conduce multi-phenotype gene-based association analysis;
3. A new algorithm to compress LD data;
4. When building analysis genome, you can use GEncode to map SNPs onto genes and filter SNPs by imputation quality scores;
5. A new plotting functions for SNPs of in a gene region;
6. A new function to automatically remove overlap genes for gene set-based based analysis.
Rlease kgg3 beta version!!!
Add a function to use the VCF format MACH Haplotypes for LD calculation when building analysis genome by positions and conducting pathway-set and PPI based association analysis.
1. Refined the approximation of the functions of GATES and HYST to combine p values of multiple blocks.
2. Fixed a small bug in mapping SNPs onto genes in which a tiny fraction of genes might have multiple identical SNPs.
3. Developed a new algorithm to detect the heterogeneity between a pair of PPI genes which could exclude the genes with redundant association signals.
1. Add a function to weight gene-based p-values by gene network topological properties for multiple testing.
2. Mark the key SNPs by GATES in the full annotation of gene-based association analysis.
Fixed a bug in "Build analysis genome by position" function in which the "Extended gene region" option did not work.
1. Refine the algorithm and procedure for pathway gene set-based association test;
2. Integrate the latest MsigDB gene set into KGG.
1. Added a hybrid approach (GATES + Scaled chi-square test) for gene-based association, which is more powerful in many situations than GATES and Scaled chi-square test;
2. Developed a novel protein-protein interaction (PPI) based association test which accounts for LD between genes and importance weights for PPI genes.
Add a function to use public available Haplotype data ( to extract LD information when building analysis genome by Position
1. Add a function to build analysis genome according to variants physical positions which is suitable for SNPs without established RSIDs.
2. Refine the gene-based LD plotting function.
Use MD5 checking to make sure the downloaded resources data are complete.
1. View linkage disequilibrium (LD) pattern of SNPs within a gene.
2. Refine Quantile - Quantile plots and Manhattan plots.
1. Update the resource data by NCBI Build 37.1 (hg19) and the SNPs size was doubled.
2. A more stable and faster technique to speed up downloading of resources.
Refine the data-structure to process large datasets (millions of SNPs) on ordinary computers with RAM less than 1.5GB.
Add a sample dataset to the KGG package and website.
Use the curated canonical pathways by GSEA ( for pathway analysis.
Update KGG1 to KGG2. Please see the improvements of KGG2 in the user manual of KGG2.
Type File
MS Windows / Mac OS X / Linux
User Manual User Manual3.0.pdf
Java Source Codes
MS Windows / Mac OS X / Linux
User Manual User Manual3.5.pdf
Java Source Codes

Miao-xin Li, Zhongshan School of Medicine,Sun Yat-sen University && Centre for Genomic Sciences. All rights reserved.