Small tutorial of kggseq for annotation and prioritization of sequence variants of cancer samples

Miaoxin Li ( mxli@hku.hk)

 

Reference: http://grass.cgs.hku.hk/limx/kggseq/doc10/UserManual.html

Input data:

1.      A somatic variant summary file of breast cancer  [compiled from tumor portal http://www.tumorportal.org/tumor_types?ttype=BRCA]

examples/hg19_breast.txt

Note: Called variants in Variant Call Format (VCF) are even better in terms of somatic mutations.

Purpose: Identify cancer-driver somatic mutation, genes and pathways of breast cancer 


Run the commands step by step to see what will happen

1.      (This step is ignored due to lack of vcf data) Filter by QC and genetic feature (only works for VCF data)

java -Xmx3g -jar kggseq.jar --vcf-file XXX.vcf --ped-file XXX.ped.txt --indiv-pair NonTumor.1:Tumor.1,NonTumor.2:Tumor.2 --out test1 --excel --seq-qual 50.0 --gty-qual 20.0 --gty-sec-pl 50 --gty-dp 8 --gty-af-ref 0.05 --gty-af-het 0.25 --gty-af-alt 0.5 --gty-somat-p 0.05 --genotype-filter 8



2.      Annotate sequence variants by RefGenes:

java -Xmx3g -jar kggseq.jar --annovar-file examples/hg19_breast.txt --out test1 --excel --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6,7

3.      Predict driver somatic-mutations and genes of cancers

java -Xmx3g -jar kggseq.jar --annovar-file examples/hg19_breast.txt --out test1 --excel --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6,7 --db-score dbnsfp --cancer-driver-predict --filter-nondisease-variant

4.      Gene-based mutation rates for non-synonymous somatic-mutations

java -Xmx3g -jar kggseq.jar --annovar-file examples/hg19_breast.txt --out test1 --excel --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6,7 --db-score dbnsfp --cancer-driver-predict --filter-nondisease-variant --gene-mutation-rate-test --qqplot

 

5.      Test whether a set of genes is more highly ranked (according to the above gene-based mutation rate test) in an ordered list of all genes than would be expected by chance 

java -Xmx3g -jar kggseq.jar --annovar-file examples/hg19_breast.txt --out test1 --excel --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6,7 --db-score dbnsfp --cancer-driver-predict --filter-nondisease-variant --gene-mutation-rate-test --qqplot --geneset-enrichment-test --geneset-db cura

 

6.      Annotate sequence variants COSMIC somatic and OMIM information 

java -Xmx3g -jar kggseq.jar --annovar-file examples/hg19_breast.txt --out test1 --excel --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6,7 --db-score dbnsfp --cancer-driver-predict --filter-nondisease-variant --gene-mutation-rate-test --qqplot --geneset-enrichment-test --geneset-db cura --cosmic-annot --omim-annot

 

7.      Prioritize sequence variants by candidate genes with  protein interaction information and known biological pathways/gene sets

java -Xmx3g -jar kggseq.jar --annovar-file examples/hg19_breast.txt --out test1 --excel --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6,7 --db-score dbnsfp --cancer-driver-predict --filter-nondisease-variant --gene-mutation-rate-test --qqplot --geneset-enrichment-test --geneset-db cura --cosmic-annot --omim-annot --candi-list NKX3,PTEN,TP53 --ppi-annot string --ppi-depth 1 --geneset-annot cura

 

8.      Prioritize sequence variants by phenolyzer, mouse phenotypes and zebrafish phenotypes

java -Xmx3g -jar kggseq.jar --annovar-file examples/hg19_breast.txt --out test1 --excel --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6,7 --db-score dbnsfp --cancer-driver-predict --filter-nondisease-variant --gene-mutation-rate-test --qqplot --geneset-enrichment-test --geneset-db cura --cosmic-annot --omim-annot --candi-list NKX3,PTEN,TP53 --ppi-annot string --ppi-depth 1 --geneset-annot cura --mouse-pheno --zebrafish-pheno --phenolyzer-prediction --phenotype-term breast+cancer