Export as VCF Format

The following command is used to export the GTB file to VCF format in the command line:

java -jar gbc.jar gtb2vcf <input> [output] [options]
  • When output is not set, the output is exported to the terminal using standard output.
  • When output ends with .gz or .bgz (as extension), the output file is compressed using bgzip (level: 5), otherwise the output is in text format.

[!NOTE|label:Example|style:callout]

Use GBC to export the example file https://pmglab.top/gbc/download/assoc.hg19.gtb locally and set the following parameters:

  • Exclude variants with the alternate allele frequency (AF) per variant out of the range [0.05, 0.95].
  • Exclude variants with the alternative allele number < 3 per variant.

The command line instructions to complete the task are as follows:

# Run directly in the terminal
java -jar gbc.jar gtb2vcf https://pmglab.top/gbc/download/assoc.hg19.gtb ./assoc.hg19.vcf.gz \
                          --seq-af 0.05-0.95 --allele-num 3-

# Run it using docker
docker run -v `pwd`:`pwd` -w `pwd` --rm -it -m 4g gbc \
gtb2vcf https://pmglab.top/gbc/download/assoc.hg19.gtb ./assoc.hg19.vcf.gz \
        --seq-af 0.05-0.95 --allele-num 3-

Program Options

Usage: gtb2vcf <input> [output] [options]
Java-API: edu.sysu.pmglab.gbc.toolkit.VCFExporter
About: Decompress and export variants from *.gtb (genotype block format) to 
       *.vcf (variant call format) or *.vcf.gz.
Options:
  --chromosome  Specify the chromosome tags file. e.g., identify 'X, chrX, 
                CHRX, ChrX' as '(int) 22' chromosome.
                format: --chromosome <string>
  --threads,-t  Set the number of threads.
                default: 4
                format: --threads <int>
Subset Selection Options:
  --field,-f           Select the specified fields from the *.gtb file to the 
                       output file.
                       default: META,GENOTYPE
                       format: --field <string>,<string>,... 
                       ([META/ID/QUAL/FILTER/INFO/GENOTYPE/ALL/NONE] 
                       (ignoreCase)) 
  --subject,-s         Retrieve the genotypes of the specified subject (by 
                       subject names). Subject names can be stored in a file 
                       with comma-separated format, and pass in via '-s @file'.
                       format: --subject <string>,<string>,...
  --subject-range,-sr  Retrieve the genotypes of the specified subject (by 
                       intervals of subject index).
                       format: --subject-range <minIndex>-<maxIndex> (>= 0)
  --subject-index,-si  Retrieve the genotypes of the specified subject (by 
                       subject indexes).
                       format: --subject-index <index1>,<index2>,... (>= 0)
  --pos,-p             Retrieve the variants by the specified coordinates of 
                       variant. 
                       format: --pos <chr>:<pos>,<pos>,... ... (>= 1)
  --pos-range,-pr      Retrieve the variants by the specified coordinate 
                       intervals of variant.
                       format: --pos-range <chr>:<minPos>-<maxPos>,... (>= 1)
  --index-range,-ir    Retrieve the variants by the line-index of variant.
                       format: --index-range <minIndex>-<maxIndex> (>= 0)
Edit Subject Name Options:
  --rename-subject         Reset subject names for *.gtb directly. Pairs of 
                           subject name can be stored in a file with 
                           comma-separated format, and pass in via '-s @file'.
                           format: --rename-subject <old>=<new>,<old>=<new>,...
  --rename-subject-prefix  Use the format `[prefix][number][suffix]` to reset 
                           the subject names.
                           format: --rename-subject-prefix <string>
  --rename-subject-suffix  Use the format `[prefix][number][suffix]` to reset 
                           the subject names.
                           format: --rename-subject-suffix <string>
  --rename-subject-begin   Use the format `[prefix][number][suffix]` to reset 
                           the subject names.
                           format: --rename-subject-begin <int>
Edit Meta Options:
  --add-meta           Add the specified metas to the output file.
                       format: --add-meta <key>=<value> <key>=<value> ...
  --auto-meta          Automatically adds the contig metas specifying the 
                       reference genome version 
                       (https://www.ncbi.nlm.nih.gov/grc/human/data). Contig 
                       metas are required in many software analyses (to 
                       describe chromosome tags), such as BCFTools.
                       format: --auto-meta <string> ([hg18/hg19/hg38])
  --rm-meta            Remove all meta information.
  --rm-duplicate-meta  Remove duplicate meta information.
Quality Control Options:
  --allele-num       Exclude variants with the alternative allele number per 
                     variant out of the range [minAlleleNum, maxAlleleNum].
                     format: --allele-num <minAlleleNum>-<maxAlleleNum> (0 ~ 
                     255) 
  --seq-qual         Exclude variants with the minimal overall sequencing 
                     quality score (Phred Quality Score) per variant < minQual.
                     format: --seq-qual <minQual> (>= 0.0)
  --seq-fs           Exclude variants with the overall strand bias Phred-scaled 
                     p-value (using Fisher's exact test) per variant > maxFs.
                     format: --seq-fs <maxFs> (>= 0.0)
  --seq-mq           Exclude variants with the minimal overall mapping quality 
                     score (Mapping Quality Score) per variant < minMq.
                     format: --seq-mq <minMq> (>= 0.0)
  --seq-info         Exclude variants with the information (i.e., INFO in VCF) 
                     field contain or do not contain (starts with ^) the 
                     specified strings.
                     format: --seq-info <string> <string> ...
  --seq-ac           Exclude variants with the alternate allele count (AC) per 
                     variant out of the range [minAc, maxAc].
                     format: --seq-ac <minAc>-<maxAc> (>= 0)
  --seq-af           Exclude variants with the alternate allele frequency (AF) 
                     per variant out of the range [minAf, maxAf].
                     format: --seq-af <minAf>-<maxAf> (0.0 ~ 1.0)
  --seq-an           Exclude variants with the non-missing allele number (AN) 
                     per variant out of the range [minAn, maxAn].
                     format: --seq-an <minAn>-<maxAn> (>= 0)
  --field-condition  Extract variants by the values of the specified 
                     supplementary fields. For comparable fields, the 
                     'condition' format is 'minValue-maxValue'; for other 
                     formats, the 'condition' is multiple optional values 
                     separated by ','.
                     format: --field-condition <field>=<condition> 
                     <field>=<condition> ...

API Toolkit

The API tool for converting GTB files to VCF files is edu.sysu.pmglab.gbc.VCFExporter, example of usage is as follows:

VCFExporter.of("https://pmglab.top/gbc/download/assoc.hg19.gtb")
        .addVariantFilter(new GTBFilter().filterByAF(new Interval<>(0.05f, 0.95f)).filterByAlleleNum(new Interval<>(3, null)))
        .submit();
Copyright ©Liubin Zhang all right reservedLast modified time: 2023-04-10 18:26:20

results matching ""

    No results matching ""