Small tutorial of kggseq for annotation and prioritization of exome sequence variants

Miaoxin Li (limx54@163.com)

 

Reference: https://pmglab.top/kggseq/doc10/UserManual.html

Input data:

1.      A Variant Call Format (VCF) file (a simulated data set)

examples/rare.disease.hg19.vcf

2.      A linkage pedigree file:

 examples/rare.disease.ped.txt

 

Purpose: Identify sequence variant candidate that may cause Schizophrenia


Run the commands step by step to see what will happen

1.      Filter by genetic feature and inheritance model (compound-heterozygosity or recessive)

java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --double-hit-gene-trio-filter

//when QC is imposed
java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --double-hit-gene-trio-filter --seq-qual 50 --seq-mq 20 --seq-sb -10 --gty-qual 20 --gty-dp 8

2.      Annotate sequence variants by RefGenes:

java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --double-hit-gene-trio-filter --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6

3.      Filter sequence variants by Common variants

java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --double-hit-gene-trio-filter --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6 --db-filter 1kg201204,dbsnp137,ESP6500AA,ESP6500EA --rare-allele-freq 0.03

 

4.      Prioritize sequence variants by disease-causing prediction

java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --double-hit-gene-trio-filter --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6 --db-filter 1kg201204,dbsnp137,ESP6500AA,ESP6500EA --rare-allele-freq 0.03 --db-score dbnsfp --mendel-causing-predict best --filter-nondisease-variant

 

5.      Prioritize sequence variants by alterative splicing, structure variation, OMIM annotation, mouse phenotype  zebrafish phenotype and  developmental disorders

java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --double-hit-gene-trio-filter --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6 --db-filter 1kg201204,dbsnp137,ESP6500AA,ESP6500EA --rare-allele-freq 0.03 --db-score dbnsfp --mendel-causing-predict best --filter-nondisease-variant --scsnv-annot --dgv-cnv-annot --omim-annot --mouse-pheno --zebrafish-pheno --ddd-annot

 

6.      Prioritize sequence variants by candidate genes with  protein interaction information

java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --double-hit-gene-trio-filter --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6 --db-filter 1kg201204,dbsnp137,ESP6500AA,ESP6500EA --rare-allele-freq 0.03 --db-score dbnsfp --mendel-causing-predict best --filter-nondisease-variant --scsnv-annot --dgv-cnv-annot --omim-annot --mouse-pheno --zebrafish-pheno --ddd-annot --candi-list LSM1,NRGN,SYNE1 --ppi-annot string --ppi-depth 1

 

7.      Prioritize sequence variants by candidate genes with  pathway information

java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --double-hit-gene-trio-filter --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6 --db-filter 1kg201204,dbsnp137,ESP6500AA,ESP6500EA --rare-allele-freq 0.03 --db-score dbnsfp --mendel-causing-predict best --filter-nondisease-variant --scsnv-annot --dgv-cnv-annot --omim-annot --mouse-pheno --zebrafish-pheno --ddd-annot --candi-list LSM1,NRGN,SYNE1 --ppi-annot string --ppi-depth 1 --pathway-annot cura

8.      Predict pathogenicity of genes of candidate sequence variants by functional prediction and phenotype mining

java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --double-hit-gene-trio-filter --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6 --db-filter 1kg201204,dbsnp137,ESP6500AA,ESP6500EA --rare-allele-freq 0.03 --db-score dbnsfp --mendel-causing-predict best --filter-nondisease-variant --scsnv-annot --dgv-cnv-annot --omim-annot --mouse-pheno --zebrafish-pheno --ddd-annot --candi-list LSM1,NRGN,SYNE1 --ppi-annot string --ppi-depth 1 --pathway-annot cura --patho-gene-predict --phenotype-term Schizophrenia --phenolyzer-prediction

9.      Prioritize sequence variants by PubMed

java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --double-hit-gene-trio-filter --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6 --db-filter 1kg201204,dbsnp137,ESP6500AA,ESP6500EA --rare-allele-freq 0.03 --db-score dbnsfp --mendel-causing-predict best --filter-nondisease-variant --scsnv-annot --dgv-cnv-annot --omim-annot --mouse-pheno --zebrafish-pheno --ddd-annot --candi-list LSM1,NRGN,SYNE1 --ppi-annot string --ppi-depth 1 --pathway-annot cura --patho-gene-predict --phenotype-term Schizophrenia --phenolyzer-prediction --pubmed-mining Schizophrenia

 

 

Others

10.   Output with plink binary files
java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --o-plink-bed