Small tutorial of kggseq for annotation and prioritization of exome sequencing variants

Miaoxin Li ( mxli@hku.hk)

 

Reference: http://statgenpro.psychiatry.hku.hk/limx/kggseq/doc/UserManual.html

Input data:

1.       A Variant Call Format (VCF) file (a simulated data set)

examples/rare.disease.hg19.vcf

2.       A linkage pedigree file:

 examples/rare.disease.ped.txt

 

Purpose: Identify sequence variant candidate that may cause Schizophrenia by a double hit model (i.e., compound-heterozygosity or recessive model)


Run the commands step by step to see what will happen

1.       Filter by genetic feature and inheritance model (compound-heterozygosity or recessive)

java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt  --out test1 --excel --double-hit-gene-trio-filter

//when QC is imposed
java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt  --out test1 --excel --double-hit-gene-trio-filter --seq-qual 50 --seq-mq 20 --seq-sb -10 --gty-qual 20 --gty-dp 8

2.       Annotate sequence variants by RefGenes:

java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt  --out test1 --excel --double-hit-gene-trio-filter --seq-qual 50 --seq-mq 20 --seq-sb -10 --gty-qual 20 --gty-dp 8 --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6

3.       Filter sequence variants by Common variants

java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt  --out test1 --excel --double-hit-gene-trio-filter --seq-qual 50 --seq-mq 20 --seq-sb -10 --gty-qual 20 --gty-dp 8 --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter 1kg201204,dbsnp137,ESP6500AA,ESP6500EA --rare-allele-freq 0.01

 

4.       Prioritize sequence variants by disease-causing prediction

java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt  --out test1 --excel --double-hit-gene-trio-filter --seq-qual 50 --seq-mq 20 --seq-sb -10 --gty-qual 20 --gty-dp 8 --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter 1kg201204,dbsnp137,ESP6500AA,ESP6500EA --rare-allele-freq 0.02 --db-score dbnsfp --mendel-causing-predict all --filter-nondisease-variant

 

5.       Prioritize sequence variants by other genomic and OMIM annotation 

java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt  --out test1 --excel --double-hit-gene-trio-filter --seq-qual 50 --seq-mq 20 --seq-sb -10 --gty-qual 20 --gty-dp 8 --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter 1kg201204,dbsnp137,ESP6500AA,ESP6500EA --rare-allele-freq 0.02 --db-score dbnsfp --mendel-causing-predict all --filter-nondisease-variant --genome-annot --omim-annot

 

6.       Prioritize sequence variants by candidate genes with  protein interaction information

java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt  --out test1 --excel --double-hit-gene-trio-filter --seq-qual 50 --seq-mq 20 --seq-sb -10 --gty-qual 20 --gty-dp 8 --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter 1kg201204,dbsnp137,ESP6500AA,ESP6500EA --rare-allele-freq 0.02 --db-score dbnsfp --mendel-causing-predict all --filter-nondisease-variant --genome-annot --candi-list LSM1,NRGN,SYNE1 --ppi-annot string --ppi-depth 1

 

7.       Prioritize sequence variants by candidate genes with  pathway information

java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt  --out test1 --excel --double-hit-gene-trio-filter --seq-qual 50 --seq-mq 20 --seq-sb -10 --gty-qual 20 --gty-dp 8 --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter 1kg201204,dbsnp137,ESP6500AA,ESP6500EA --rare-allele-freq 0.02 --db-score dbnsfp --mendel-causing-predict all --filter-nondisease-variant --genome-annot --candi-list LSM1,NRGN,SYNE1 --ppi-annot string --ppi-depth 1 --pathway-annot cura

 

8.       Prioritize sequence variants by PubMed

java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt  --out test1 --excel --double-hit-gene-trio-filter --seq-qual 50 --seq-mq 20 --seq-sb -10 --gty-qual 20 --gty-dp 8 --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter 1kg201204,dbsnp137,ESP6500AA,ESP6500EA --rare-allele-freq 0.02 --db-score dbnsfp --mendel-causing-predict all --filter-nondisease-variant --genome-annot --candi-list LSM1,NRGN,SYNE1 --ppi-annot string --ppi-depth 1 --pathway-annot cura --pubmed-mining Schizophrenia