Small tutorial of kggseq for annotation and prioritization of exome sequence variants  (for KGGSeqV1.0+)

Miaoxin Li (limx54@163.com)

 

Reference: https://pmglab.top/kggseq/doc10/UserManual.html

Input data:

1.      A Variant Call Format (VCF) file (a fabled data set for education purpose)

examples/rare.disease.hg19.vcf

2.      A linkage pedigree file:

 examples/rare.disease.ped.txt

 

Purpose: Identify sequence variant candidate that may cause Arthrogryposis,


Run the commands step by step to see what will happen

1.       Filter by genetic feature and inheritance model (recessive)

java -Xmx4g -jar kggseq.jar --vcf-file ./examples/rare.disease.hg19.vcf --ped-file ./examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6

//to explicitly impose QC. However, the QC is performed by default.
java -Xmx4g -jar kggseq.jar --vcf-file ./examples/rare.disease.hg19.vcf --ped-file ./examples/rare.disease.ped.txt --out test1 --excel --genotype-filter
1,2,6 --gty-dp 8

2.       Annotate sequence variants by RefGenes:

java -Xmx4g -jar kggseq.jar --vcf-file ./examples/rare.disease.hg19.vcf --ped-file ./examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6  --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 

3.       Filter sequence variants by Common variants

java -Xmx4g -jar kggseq.jar --vcf-file ./examples/rare.disease.hg19.vcf --ped-file ./examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter exac,ehr --rare-allele-freq 0.01

 

4.       Filter neutral sequence variants by disease-causing prediction

java -Xmx4g -jar kggseq.jar --vcf-file ./examples/rare.disease.hg19.vcf --ped-file ./examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6  --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter exac,ehr --rare-allele-freq 0.01 --db-score dbnsfp --mendel-causing-predict best --filter-nondisease-variant

5.       Filter sequence variants in super-duplicate regions which are often error-prone

java -Xmx4g -jar kggseq.jar --vcf-file ./examples/rare.disease.hg19.vcf --ped-file ./examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --gty-dp 8 --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter exac,ehr --rare-allele-freq 0.01 --db-score dbnsfp --mendel-causing-predict best --filter-nondisease-variant --superdup-filter

7.       Prioritize sequence variants by other genomic and OMIM annotation 

java -Xmx4g -jar kggseq.jar --vcf-file ./examples/rare.disease.hg19.vcf --ped-file ./examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter exac,ehr --rare-allele-freq 0.01 --db-score dbnsfp --mendel-causing-predict best --filter-nondisease-variant --superdup-filter --genome-annot --omim-annot

 

8.       Prioritize sequence variants by candidate genes with protein interaction information

java -Xmx4g -jar kggseq.jar --vcf-file ./examples/rare.disease.hg19.vcf --ped-file ./examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6  --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter exac,ehr --rare-allele-freq 0.01 --db-score dbnsfp --mendel-causing-predict best --filter-nondisease-variant --superdup-filter --genome-annot --omim-annot --candi-list ECEL1,MYBPC1,TNNI2,TNNT3,TPM2 --ppi-annot string --ppi-depth 1

 

9.       Prioritize sequence variants by candidate genes with pathway information

java -Xmx4g -jar kggseq.jar --vcf-file ./examples/rare.disease.hg19.vcf --ped-file ./examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6  --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter exac,ehr --rare-allele-freq 0.01 --db-score dbnsfp --mendel-causing-predict best --filter-nondisease-variant --superdup-filter --genome-annot --omim-annot --candi-list ECEL1,MYBPC1,TNNI2,TNNT3,TPM2 --ppi-annot string --ppi-depth 1 --pathway-annot cura

 

10.   Prioritize sequence variants by PubMed

java -Xmx4g -jar kggseq.jar --vcf-file ./examples/rare.disease.hg19.vcf --ped-file ./examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6  --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter exac,ehr --rare-allele-freq 0.01 --db-score dbnsfp --mendel-causing-predict best --filter-nondisease-variant --superdup-filter --genome-annot --omim-annot --candi-list ECEL1,MYBPC1,TNNI2,TNNT3,TPM2 --ppi-annot string --ppi-depth 1 --pathway-annot cura --phenotype-term Arthrogryposis,Arthrogryposis+multiplex+congenital --pubmed-mining

 

 

Others

1.      Output with kggseq binary files
java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --gty-dp 8 --o-ked

2.      Output with plink binary files
java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --gty-dp 8 --o-plink-bed

3.      Output with ANNOVAR input files

java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --gty-dp 8 --o-annovar

4.      Output with VCF input files

java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --gty-dp 8 --o-vcf