Export as GTB Format
The following command is used to edit the GTB file and then output to TSV format in the command line:
java -jar gbc.jar edit <input> [output] [options]
When output
is not set, the output file will overwrite the original file. If the input file is a remote site file, the output file is saved under the current local working path.
[!NOTE|label:Example|style:callout]
Use GBC to export the example file
https://pmglab.top/gbc/download/assoc.hg19.gtb
locally and set the following parameters:
- Exclude variants with the alternate allele frequency (AF) per variant out of the range [0.05, 0.95].
- Exclude variants with the alternative allele number < 3 per variant.
- Liftover the variants from hg19 to hg38, and store the original coordinate with
hg19_
prefix.- Sort variant by coordinates, and store the original pointer of variants.
The command line instructions to complete this task are as follows:
# Run directly in the terminal java -jar gbc.jar edit https://pmglab.top/gbc/download/assoc.hg19.gtb ./assoc.hg38.gtb \ --seq-af 0.05-0.95 --allele-num 3- \ --liftover hg19ToHg38 --liftover-field hg19_CHROM,hg19_POS \ --sort --pointer Origin_Pointer # Run it using docker docker run -v `pwd`:`pwd` -w `pwd` --rm -it -m 4g gbc \ edit https://pmglab.top/gbc/download/assoc.hg19.gtb ./assoc.hg38.gtb \ --seq-af 0.05-0.95 --allele-num 3- \ --liftover hg19ToHg38 --liftover-field hg19_CHROM,hg19_POS \ --sort --pointer Origin_Pointer
Program Options
Usage: edit <input> [output] [options]
Java-API: edu.sysu.pmglab.gbc.toolkit.GTBExporter
About: Edit *.gtb file and export as *.gtb format (including liftOver, sort,
subset, filter, etc.). If no output file is specified, the program will
overwrite the original file.
Options:
--chromosome Specify the chromosome tags file. e.g., identify 'X, chrX,
CHRX, ChrX' as '(int) 22' chromosome.
format: --chromosome <string>
--threads,-t Set the number of threads.
default: 4
format: --threads <int>
Subset Selection Options:
--field,-f Select the specified fields from the *.gtb file to the
output file (all fields by default).
format: --field <string>,<string>,...
--no-gt Do not load and export genotypes.
--subject,-s Retrieve the genotypes of the specified subject (by
subject names). Subject names can be stored in a file
with comma-separated format, and pass in via '-s @file'.
format: --subject <string>,<string>,...
--subject-range,-sr Retrieve the genotypes of the specified subject (by
intervals of subject index).
format: --subject-range <minIndex>-<maxIndex> (>= 0)
--subject-index,-si Retrieve the genotypes of the specified subject (by
subject indexes).
format: --subject-index <index1>,<index2>,... (>= 0)
--pos,-p Retrieve the variants by the specified coordinates of
variant.
format: --pos <chr>:<pos>,<pos>,... ... (>= 1)
--pos-range,-pr Retrieve the variants by the specified coordinate
intervals of variant.
format: --pos-range <chr>:<minPos>-<maxPos>,... (>= 1)
--index-range,-ir Retrieve the variants by the line-index of variant.
format: --index-range <minIndex>-<maxIndex> (>= 0)
Edit Subject/Field Name Options:
--rename-subject Reset subject names for *.gtb directly. Pairs of
subject name can be stored in a file with
comma-separated format, and pass in via '-s @file'.
format: --rename-subject <old>=<new>,<old>=<new>,...
--rename-subject-prefix Use the format `[prefix][number][suffix]` to reset
the subject names.
format: --rename-subject-prefix <string>
--rename-subject-suffix Use the format `[prefix][number][suffix]` to reset
the subject names.
format: --rename-subject-suffix <string>
--rename-subject-begin Use the format `[prefix][number][suffix]` to reset
the subject names.
format: --rename-subject-begin <int>
--rename-field Reset field names for *.gtb directly.
format: --rename-field <old>=<new> ...
Edit Meta Options:
--add-meta Add the specified metas to the output file.
format: --add-meta <key>=<value> <key>=<value> ...
--rm-meta Remove all meta information.
--rm-duplicate-meta Remove duplicate meta information.
LiftOver, Sort and Normalized Options:
--liftover Lift over variants from one reference genome version to
another (chain files are downloaded from
http://hgdownload.cse.ucsc.edu/goldenPath/<version>/liftOver).
format: --liftover <string>
([hg19ToHg38/hg38ToHg19/hg18ToHg19/hg18ToHg38]
(ignoreCase))
--liftover-field Store original coordinate (CHROM, POS) of variants. This
parameter is usually used to associate the variants after
liftOver to the original variants (e.g.,
hg19_CHROM,hg19_POS).
format: --liftover-field <CHROM>,<POS>
--pointer Store original pointer of variants. This parameter is
usually used to associate the variants after liftOver or
normalizing to the original variants.
format: --pointer <string>
--sort Sort the variants by coordinate fields (CHROM, POS).
--normalize,-n Normalized the variants, including convert multiallelic
variant to biallelic variant and correct redundant suffixes
of REF and ALT bases.
Quality Control Options:
--allele-num Exclude variants with the alternative allele number per
variant out of the range [minAlleleNum, maxAlleleNum].
format: --allele-num <minAlleleNum>-<maxAlleleNum> (0 ~
255)
--seq-ac Exclude variants with the alternate allele count (AC) per
variant out of the range [minAc, maxAc].
format: --seq-ac <minAc>-<maxAc> (>= 0)
--seq-af Exclude variants with the alternate allele frequency (AF)
per variant out of the range [minAf, maxAf].
format: --seq-af <minAf>-<maxAf> (0.0 ~ 1.0)
--seq-an Exclude variants with the non-missing allele number (AN)
per variant out of the range [minAn, maxAn].
format: --seq-an <minAn>-<maxAn> (>= 0)
--field-condition Extract variants by the values of the specified
supplementary fields. For comparable fields, the
'condition' format is 'minValue-maxValue'; for other
formats, the 'condition' is multiple optional values
separated by ','.
format: --field-condition <field>=<condition>
<field>=<condition> ...
API Toolkit
The API tool for editing GTB is edu.sysu.pmglab.gbc.GTBExporter, example of usage is as follows:
GTBExporter.of("https://pmglab.top/gbc/download/assoc.hg19.gtb")
.addVariantFilter(new GTBFilter().filterByAF(new Interval<>(0.05f, 0.95f)).filterByAlleleNum(new Interval<>(3, null)))
.submit();