Export as TSV Format
The following command is used to export the GTB file to TSV format in the command line:
java -jar gbc.jar gtb2tsv <input> [output] [options]
- When
output
is not set, the output is exported to the terminal using standard output. - When
output
ends with.gz
or.bgz
(as extension), the output file is compressed using bgzip (level: 5), otherwise the output is in text format.
GBC provides a customizable TSV output format for easy converting GTB format to various text formats for organizing records by row. When using this feature, the output does not contain genotypes, but does contain allele count (AC), effective allele count (AN), and allele frequency (AF). To customize the output format, use --field
to specify the output fields and --rename-field
to rename the fields.
[!NOTE|label:Example|style:callout]
Use GBC to export the example file
https://pmglab.top/gbc/download/assoc.hg19.gtb
locally and set the following parameters:
- Exclude variants with the alternate allele frequency (AF) per variant out of the range [0.05, 0.95].
- Exclude variants with the alternative allele number < 3 per variant.
The command line instructions to complete the task are as follows:
# Run directly in the terminal java -jar gbc.jar gtb2tsv https://pmglab.top/gbc/download/assoc.hg19.gtb \ --seq-af 0.05-0.95 --allele-num 3- # Run it using docker docker run -v `pwd`:`pwd` -w `pwd` --rm -it -m 4g gbc \ gtb2tsv https://pmglab.top/gbc/download/assoc.hg19.gtb \ --seq-af 0.05-0.95 --allele-num 3-
Program Options
Usage: gtb2tsv <input> [output] [options]
Java-API: edu.sysu.pmglab.gbc.toolkit.TSVExporter
About: Decompress and export records from *.gtb (genotype block format) to
*.vcf (variant call format) or *.vcf.gz.
Options:
--chromosome Specify the chromosome tags file. e.g., identify 'X, chrX,
CHRX, ChrX' as '(int) 22' chromosome.
format: --chromosome <string>
--threads,-t Set the number of threads.
default: 4
format: --threads <int>
Subset Selection Options:
--field,-f Select the specified fields from the *.gtb file to the
output file (all fields by default).
format: --field <string>,<string>,...
--no-gt Do not load genotypes.
--pos,-p Retrieve the records by the specified coordinates of
variant.
format: --pos <chr>:<pos>,<pos>,... ... (>= 1)
--pos-range,-pr Retrieve the records by the specified coordinate intervals
of variant.
format: --pos-range <chr>:<minPos>-<maxPos>,... (>= 1)
--index-range,-ir Retrieve the records by the line-index of variant.
format: --index-range <minIndex>-<maxIndex> (>= 0)
--rename-field Reset field names for *.gtb directly.
format: --rename-field <old>=<new> ...
Edit Meta Options:
--add-meta Add the specified metas to the output file.
format: --add-meta <key>=<value> <key>=<value> ...
--rm-meta Remove all meta information.
--rm-duplicate-meta Remove duplicate meta information.
Quality Control Options:
--allele-num Exclude records with the alternative allele number per
variant out of the range [minAlleleNum, maxAlleleNum].
format: --allele-num <minAlleleNum>-<maxAlleleNum> (0 ~
255)
--seq-qual Exclude records with the minimal overall sequencing
quality score (Phred Quality Score) per variant < minQual.
format: --seq-qual <minQual> (>= 0.0)
--seq-fs Exclude records with the overall strand bias Phred-scaled
p-value (using Fisher's exact test) per variant > maxFs.
format: --seq-fs <maxFs> (>= 0.0)
--seq-mq Exclude records with the minimal overall mapping quality
score (Mapping Quality Score) per variant < minMq.
format: --seq-mq <minMq> (>= 0.0)
--seq-info Exclude records with the information (i.e., INFO in VCF)
field contain or do not contain (starts with ^) the
specified strings.
format: --seq-info <string> <string> ...
--seq-ac Exclude records with the alternate allele count (AC) per
variant out of the range [minAc, maxAc].
format: --seq-ac <minAc>-<maxAc> (>= 0)
--seq-af Exclude records with the alternate allele frequency (AF)
per variant out of the range [minAf, maxAf].
format: --seq-af <minAf>-<maxAf> (0.0 ~ 1.0)
--seq-an Exclude records with the non-missing allele number (AN)
per variant out of the range [minAn, maxAn].
format: --seq-an <minAn>-<maxAn> (>= 0)
--field-condition Extract records by the values of the specified
supplementary fields. For comparable fields, the
'condition' format is 'minValue-maxValue'; for other
formats, the 'condition' is multiple optional values
separated by ','.
format: --field-condition <field>=<condition>
<field>=<condition> ...
API Toolkit
The API tool for converting GTB files to TSV files is edu.sysu.pmglab.gbc.TSVExporter, example of usage is as follows:
TSVExporter.of("https://pmglab.top/gbc/download/assoc.hg19.gtb")
.addVariantFilter(new GTBFilter().filterByAF(new Interval<>(0.05f, 0.95f)).filterByAlleleNum(new Interval<>(3, null)))
.submit();