Export as GTB Format

The following command is used to edit the GTB file and then output to TSV format in the command line:

java -jar gbc.jar edit <input> [output] [options]

When output is not set, the output file will overwrite the original file. If the input file is a remote site file, the output file is saved under the current local working path.

[!NOTE|label:Example|style:callout]

Use GBC to export the example file https://pmglab.top/gbc/download/assoc.hg19.gtb locally and set the following parameters:

  • Exclude variants with the alternate allele frequency (AF) per variant out of the range [0.05, 0.95].
  • Exclude variants with the alternative allele number < 3 per variant.
  • Liftover the variants from hg19 to hg38, and store the original coordinate with hg19_ prefix.
  • Sort variant by coordinates, and store the original pointer of variants.

The command line instructions to complete this task are as follows:

# Run directly in the terminal
java -jar gbc.jar edit https://pmglab.top/gbc/download/assoc.hg19.gtb ./assoc.hg38.gtb \
                       --seq-af 0.05-0.95 --allele-num 3- \
                       --liftover hg19ToHg38 --liftover-field hg19_CHROM,hg19_POS \
                       --sort --pointer Origin_Pointer

# Run it using docker
docker run -v `pwd`:`pwd` -w `pwd` --rm -it -m 4g gbc \
edit https://pmglab.top/gbc/download/assoc.hg19.gtb ./assoc.hg38.gtb \
     --seq-af 0.05-0.95 --allele-num 3- \
     --liftover hg19ToHg38 --liftover-field hg19_CHROM,hg19_POS \
     --sort --pointer Origin_Pointer

Program Options

Usage: edit <input> [output] [options]
Java-API: edu.sysu.pmglab.gbc.toolkit.GTBExporter
About: Edit *.gtb file and export as *.gtb format (including liftOver, sort,
       subset, filter, etc.). If no output file is specified, the program will 
       overwrite the original file.
Options:
  --chromosome  Specify the chromosome tags file. e.g., identify 'X, chrX, 
                CHRX, ChrX' as '(int) 22' chromosome.
                format: --chromosome <string>
  --threads,-t  Set the number of threads.
                default: 4
                format: --threads <int>
Subset Selection Options:
  --field,-f           Select the specified fields from the *.gtb file to the 
                       output file (all fields by default).
                       format: --field <string>,<string>,...
  --no-gt              Do not load and export genotypes.
  --subject,-s         Retrieve the genotypes of the specified subject (by 
                       subject names). Subject names can be stored in a file 
                       with comma-separated format, and pass in via '-s @file'.
                       format: --subject <string>,<string>,...
  --subject-range,-sr  Retrieve the genotypes of the specified subject (by 
                       intervals of subject index).
                       format: --subject-range <minIndex>-<maxIndex> (>= 0)
  --subject-index,-si  Retrieve the genotypes of the specified subject (by 
                       subject indexes).
                       format: --subject-index <index1>,<index2>,... (>= 0)
  --pos,-p             Retrieve the variants by the specified coordinates of 
                       variant. 
                       format: --pos <chr>:<pos>,<pos>,... ... (>= 1)
  --pos-range,-pr      Retrieve the variants by the specified coordinate 
                       intervals of variant.
                       format: --pos-range <chr>:<minPos>-<maxPos>,... (>= 1)
  --index-range,-ir    Retrieve the variants by the line-index of variant.
                       format: --index-range <minIndex>-<maxIndex> (>= 0)
Edit Subject/Field Name Options:
  --rename-subject         Reset subject names for *.gtb directly. Pairs of 
                           subject name can be stored in a file with 
                           comma-separated format, and pass in via '-s @file'.
                           format: --rename-subject <old>=<new>,<old>=<new>,...
  --rename-subject-prefix  Use the format `[prefix][number][suffix]` to reset 
                           the subject names.
                           format: --rename-subject-prefix <string>
  --rename-subject-suffix  Use the format `[prefix][number][suffix]` to reset 
                           the subject names.
                           format: --rename-subject-suffix <string>
  --rename-subject-begin   Use the format `[prefix][number][suffix]` to reset 
                           the subject names.
                           format: --rename-subject-begin <int>
  --rename-field           Reset field names for *.gtb directly.
                           format: --rename-field <old>=<new> ...
Edit Meta Options:
  --add-meta           Add the specified metas to the output file.
                       format: --add-meta <key>=<value> <key>=<value> ...
  --rm-meta            Remove all meta information.
  --rm-duplicate-meta  Remove duplicate meta information.
LiftOver, Sort and Normalized Options:
  --liftover        Lift over variants from one reference genome version to 
                    another (chain files are downloaded from 
                    http://hgdownload.cse.ucsc.edu/goldenPath/<version>/liftOver). 
                    format: --liftover <string> 
                    ([hg19ToHg38/hg38ToHg19/hg18ToHg19/hg18ToHg38] 
                    (ignoreCase)) 
  --liftover-field  Store original coordinate (CHROM, POS) of variants. This 
                    parameter is usually used to associate the variants after 
                    liftOver to the original variants (e.g., 
                    hg19_CHROM,hg19_POS). 
                    format: --liftover-field <CHROM>,<POS>
  --pointer         Store original pointer of variants. This parameter is 
                    usually used to associate the variants after liftOver or 
                    normalizing to the original variants.
                    format: --pointer <string>
  --sort            Sort the variants by coordinate fields (CHROM, POS).
  --normalize,-n    Normalized the variants, including convert multiallelic 
                    variant to biallelic variant and correct redundant suffixes 
                    of REF and ALT bases.
Quality Control Options:
  --allele-num       Exclude variants with the alternative allele number per 
                     variant out of the range [minAlleleNum, maxAlleleNum].
                     format: --allele-num <minAlleleNum>-<maxAlleleNum> (0 ~ 
                     255) 
  --seq-ac           Exclude variants with the alternate allele count (AC) per 
                     variant out of the range [minAc, maxAc].
                     format: --seq-ac <minAc>-<maxAc> (>= 0)
  --seq-af           Exclude variants with the alternate allele frequency (AF) 
                     per variant out of the range [minAf, maxAf].
                     format: --seq-af <minAf>-<maxAf> (0.0 ~ 1.0)
  --seq-an           Exclude variants with the non-missing allele number (AN) 
                     per variant out of the range [minAn, maxAn].
                     format: --seq-an <minAn>-<maxAn> (>= 0)
  --field-condition  Extract variants by the values of the specified 
                     supplementary fields. For comparable fields, the 
                     'condition' format is 'minValue-maxValue'; for other 
                     formats, the 'condition' is multiple optional values 
                     separated by ','.
                     format: --field-condition <field>=<condition> 
                     <field>=<condition> ...

API Toolkit

The API tool for editing GTB is edu.sysu.pmglab.gbc.GTBExporter, example of usage is as follows:

GTBExporter.of("https://pmglab.top/gbc/download/assoc.hg19.gtb")
        .addVariantFilter(new GTBFilter().filterByAF(new Interval<>(0.05f, 0.95f)).filterByAlleleNum(new Interval<>(3, null)))
        .submit();
Copyright ©Liubin Zhang all right reservedLast modified time: 2023-04-10 18:26:17

results matching ""

    No results matching ""