KGG:A systematic biological Knowledge-based mining system for Genome-wide Genetic studies
 
KGG 2.5 Demo Video
 
KGG Application (If you have any question about KGG, please email: limx54@163.com)
Type File Version
MS Windows / Mac OS X / Linux KGG4 4.1
User Manual User Manual.pdf 4.1
Sample Data KGGSample.zip -
Source Code KGG4.0.src.zip 4.0

Phased genotypes of SNPs to account for linkage disequilibrium
Type File
1000 Genomes Projects Go


Hints for large GWAS dataset (around or over 2.5 million SNPs)
1. Maximize your Java heap size larger than 8GB to initiate KGG, by Tools->Set System Memory.
2. Only EXPORT a small set of genes or SNPs you are interested


Disclaimer:
KGG is free of charge. All materials on the website are provided without any warranty. Please use them at your own risk.

Updated:
03/23/2021
Update all download links in KGG.
07/01/2019
Add an analysis package for estimating driver-tissue of complex phenotypes based on summary statistics.
09/08/2018
Update publication information of gene-based and conditional gene-based association by ECS.
10/18/2017
Fixed a bug in downloading gene symbols from HGNC database.
05/01/2017
Fixed bugs in LOG messages.
10/26/2016
1. Fixed a bug in the interface of building analysis genome.
2. Update the database of GEncode to be version 25.
02/04/2016
Release KGG4.0, in which three powerful tests are added, a gene-based association test, a conditional gene-based association test, and a gene-set based association test.
05/04/2015
Implement a multivariate association test ( Trait-based Association Test that uses Extended Simes procedure, TATES ) for SNPs inside genes when conducting gene-based association analysis.
02/13/2015
Improve the set-based power estimation module.
12/01/2014
1. Update user manual from KGG3.0 to KGG3.5
2. Allow users to select genes according to the gene groups by HGNC (http://www.genenames.org/)
10/23/2014
Provide a link to newly compiled 1000 Genomes Project phased genotypes datasets to account for linkage disequilibrium.
10/10/2014
Refine some graphic interfaces to view gene- and pathway-based results.
08/28/2014
Release KGG3.5 today. Compared to the KGG3.0, KGG3.5 has several new features.
1. 100 times faster than KGG3 or earlier version when building analysis genome with around 10 million SNPs.
2. Add a function to calculate power of set-based tests.
3. Exclude SNPs without LD for set-based test, which inflate the type one errors in previous version.
08/10/2014
1. Fixed some minor bugs.
2. Added a new function to replace the old gene symbols with the latest ones according to HGNC database, http://www.genenames.org/. (Thank Attila Pulay for reporting the problem)
08/04/2014
1. Allow users to exclude SNPs without LD information for gene-based association test. 2. Improve the gene-pair based association test for large gene-pair sets.
07/28/2014
Fixed bug in gene-based LD plotting with re-used LD data.
07/10/2014
Fixed a minor bug for Benjamini & Hochberg (1995) FDR and add Benjamini & Yekutieli (2001) FDR test.
06/17/2014
Update PPI-based modules to be Interaction-based modules which can read multiple gene-pair files at a time. A gene-pair can be defined according to protein interaction, co-expression or other biological evens.
06/05/2014
1. Update PPI-based association module to Interaction-based association module.
2. Separate multivariate gene-based association from gene-based association.
04/24/2014
Release KGG3.0 today. Compared to the KGG2.5, KGG3 has several new features.
1. A more user-friendly interface based on NetBeans modules;
2. A new function to conduce multi-phenotype gene-based association analysis;
3. A new algorithm to compress LD data;
4. When building analysis genome, you can use GEncode to map SNPs onto genes and filter SNPs by imputation quality scores;
5. A new plotting functions for SNPs of in a gene region;
6. A new function to automatically remove overlap genes for gene set-based based analysis.
10/18/2013
Rlease kgg3 beta version!!!
04/01/2013
Add a function to use the VCF format MACH Haplotypes for LD calculation when building analysis genome by positions and conducting pathway-set and PPI based association analysis.
03/12/2013
1. Refined the approximation of the functions of GATES and HYST to combine p values of multiple blocks.
2. Fixed a small bug in mapping SNPs onto genes in which a tiny fraction of genes might have multiple identical SNPs.
3. Developed a new algorithm to detect the heterogeneity between a pair of PPI genes which could exclude the genes with redundant association signals.
12/05/2012
1. Add a function to weight gene-based p-values by gene network topological properties for multiple testing.
2. Mark the key SNPs by GATES in the full annotation of gene-based association analysis.
11/04/2012
Fixed a bug in "Build analysis genome by position" function in which the "Extended gene region" option did not work.
10/31/2012
1. Refine the algorithm and procedure for pathway gene set-based association test;
2. Integrate the latest MsigDB gene set into KGG.
02/03/2012
1. Added a hybrid approach (GATES + Scaled chi-square test) for gene-based association, which is more powerful in many situations than GATES and Scaled chi-square test;
2. Developed a novel protein-protein interaction (PPI) based association test which accounts for LD between genes and importance weights for PPI genes.
12/08/2011
Add a function to use public available Haplotype data (http://www.sph.umich.edu/csg/abecasis/MACH/download/) to extract LD information when building analysis genome by Position
09/02/2011
1. Add a function to build analysis genome according to variants physical positions which is suitable for SNPs without established RSIDs.
2. Refine the gene-based LD plotting function.
08/22/2011
Use MD5 checking to make sure the downloaded resources data are complete.
04/20/2011
1. View linkage disequilibrium (LD) pattern of SNPs within a gene.
2. Refine Quantile - Quantile plots and Manhattan plots.
04/06/2011
1. Update the resource data by NCBI Build 37.1 (hg19) and the SNPs size was doubled.
2. A more stable and faster technique to speed up downloading of resources.
03/30/2011
Refine the data-structure to process large datasets (millions of SNPs) on ordinary computers with RAM less than 1.5GB.
01/29/2011
Add a sample dataset to the KGG package and website.
   
01/18/2011
Use the curated canonical pathways by GSEA (http://www.broadinstitute.org/gsea/index.jsp) for pathway analysis.
   
07/14/2010
Update KGG1 to KGG2. Please see the improvements of KGG2 in the user manual of KGG2.
   
Archive:
  
Type File
MS Windows / Mac OS X / Linux KGG3.0.zip
User Manual User Manual3.0.pdf
Java Source Codes KGG3.0.src.zip
MS Windows / Mac OS X / Linux KGG3.5.zip
User Manual User Manual3.5.pdf
Java Source Codes KGG3.5.src.zip

Miao-xin Li, Zhongshan School of Medicine,Sun Yat-sen University && Centre for Genomic Sciences. All rights reserved.