Export as VCF Format
The following command is used to export the GTB file to VCF format in the command line:
java -jar gbc.jar gtb2vcf <input> [output] [options]
- When
output
is not set, the output is exported to the terminal using standard output. - When
output
ends with.gz
or.bgz
(as extension), the output file is compressed using bgzip (level: 5), otherwise the output is in text format.
[!NOTE|label:Example|style:callout]
Use GBC to export the example file
https://pmglab.top/gbc/download/assoc.hg19.gtb
locally and set the following parameters:
- Exclude variants with the alternate allele frequency (AF) per variant out of the range [0.05, 0.95].
- Exclude variants with the alternative allele number < 3 per variant.
The command line instructions to complete the task are as follows:
# Run directly in the terminal java -jar gbc.jar gtb2vcf https://pmglab.top/gbc/download/assoc.hg19.gtb ./assoc.hg19.vcf.gz \ --seq-af 0.05-0.95 --allele-num 3- # Run it using docker docker run -v `pwd`:`pwd` -w `pwd` --rm -it -m 4g gbc \ gtb2vcf https://pmglab.top/gbc/download/assoc.hg19.gtb ./assoc.hg19.vcf.gz \ --seq-af 0.05-0.95 --allele-num 3-
Program Options
Usage: gtb2vcf <input> [output] [options]
Java-API: edu.sysu.pmglab.gbc.toolkit.VCFExporter
About: Decompress and export variants from *.gtb (genotype block format) to
*.vcf (variant call format) or *.vcf.gz.
Options:
--chromosome Specify the chromosome tags file. e.g., identify 'X, chrX,
CHRX, ChrX' as '(int) 22' chromosome.
format: --chromosome <string>
--threads,-t Set the number of threads.
default: 4
format: --threads <int>
Subset Selection Options:
--field,-f Select the specified fields from the *.gtb file to the
output file.
default: META,GENOTYPE
format: --field <string>,<string>,...
([META/ID/QUAL/FILTER/INFO/GENOTYPE/ALL/NONE]
(ignoreCase))
--subject,-s Retrieve the genotypes of the specified subject (by
subject names). Subject names can be stored in a file
with comma-separated format, and pass in via '-s @file'.
format: --subject <string>,<string>,...
--subject-range,-sr Retrieve the genotypes of the specified subject (by
intervals of subject index).
format: --subject-range <minIndex>-<maxIndex> (>= 0)
--subject-index,-si Retrieve the genotypes of the specified subject (by
subject indexes).
format: --subject-index <index1>,<index2>,... (>= 0)
--pos,-p Retrieve the variants by the specified coordinates of
variant.
format: --pos <chr>:<pos>,<pos>,... ... (>= 1)
--pos-range,-pr Retrieve the variants by the specified coordinate
intervals of variant.
format: --pos-range <chr>:<minPos>-<maxPos>,... (>= 1)
--index-range,-ir Retrieve the variants by the line-index of variant.
format: --index-range <minIndex>-<maxIndex> (>= 0)
Edit Subject Name Options:
--rename-subject Reset subject names for *.gtb directly. Pairs of
subject name can be stored in a file with
comma-separated format, and pass in via '-s @file'.
format: --rename-subject <old>=<new>,<old>=<new>,...
--rename-subject-prefix Use the format `[prefix][number][suffix]` to reset
the subject names.
format: --rename-subject-prefix <string>
--rename-subject-suffix Use the format `[prefix][number][suffix]` to reset
the subject names.
format: --rename-subject-suffix <string>
--rename-subject-begin Use the format `[prefix][number][suffix]` to reset
the subject names.
format: --rename-subject-begin <int>
Edit Meta Options:
--add-meta Add the specified metas to the output file.
format: --add-meta <key>=<value> <key>=<value> ...
--auto-meta Automatically adds the contig metas specifying the
reference genome version
(https://www.ncbi.nlm.nih.gov/grc/human/data). Contig
metas are required in many software analyses (to
describe chromosome tags), such as BCFTools.
format: --auto-meta <string> ([hg18/hg19/hg38])
--rm-meta Remove all meta information.
--rm-duplicate-meta Remove duplicate meta information.
Quality Control Options:
--allele-num Exclude variants with the alternative allele number per
variant out of the range [minAlleleNum, maxAlleleNum].
format: --allele-num <minAlleleNum>-<maxAlleleNum> (0 ~
255)
--seq-qual Exclude variants with the minimal overall sequencing
quality score (Phred Quality Score) per variant < minQual.
format: --seq-qual <minQual> (>= 0.0)
--seq-fs Exclude variants with the overall strand bias Phred-scaled
p-value (using Fisher's exact test) per variant > maxFs.
format: --seq-fs <maxFs> (>= 0.0)
--seq-mq Exclude variants with the minimal overall mapping quality
score (Mapping Quality Score) per variant < minMq.
format: --seq-mq <minMq> (>= 0.0)
--seq-info Exclude variants with the information (i.e., INFO in VCF)
field contain or do not contain (starts with ^) the
specified strings.
format: --seq-info <string> <string> ...
--seq-ac Exclude variants with the alternate allele count (AC) per
variant out of the range [minAc, maxAc].
format: --seq-ac <minAc>-<maxAc> (>= 0)
--seq-af Exclude variants with the alternate allele frequency (AF)
per variant out of the range [minAf, maxAf].
format: --seq-af <minAf>-<maxAf> (0.0 ~ 1.0)
--seq-an Exclude variants with the non-missing allele number (AN)
per variant out of the range [minAn, maxAn].
format: --seq-an <minAn>-<maxAn> (>= 0)
--field-condition Extract variants by the values of the specified
supplementary fields. For comparable fields, the
'condition' format is 'minValue-maxValue'; for other
formats, the 'condition' is multiple optional values
separated by ','.
format: --field-condition <field>=<condition>
<field>=<condition> ...
API Toolkit
The API tool for converting GTB files to VCF files is edu.sysu.pmglab.gbc.VCFExporter, example of usage is as follows:
VCFExporter.of("https://pmglab.top/gbc/download/assoc.hg19.gtb")
.addVariantFilter(new GTBFilter().filterByAF(new Interval<>(0.05f, 0.95f)).filterByAlleleNum(new Interval<>(3, null)))
.submit();