Split GTB

Splitting a GTB file into multiple subfiles is a common requirement when the size of a single GTB file is quite large. Because this facilitates program debugging and file transfer. A typical example is the whole exome genotype of UKBB, and the genotypes on chromosome 1 is split into 97 subfiles. Use the following command to split a GTB file into multiple smaller subfiles:

java -jar gbc.jar split <input> [output] [options]

The split subfile can be re-joined by the GTBConcat.

splitGTB

[!NOTE|label:Example|style:callout]

Use GBC to split the example file https://pmglab.top/gbc/download/rare.disease.hg19.gtb into multiple subfiles according to the chromosome tag of the variants.

# Run directly in the terminal
java -jar gbc.jar split https://pmglab.top/gbc/download/rare.disease.hg19.gtb

# Run it using docker
docker run -v `pwd`:`pwd` -w `pwd` --rm -it -m 4g gbc \
split https://pmglab.top/gbc/download/rare.disease.hg19.gtb

Program Options

Usage: split <input> [outputDir] [options]
Java-API: edu.sysu.pmglab.gbc.toolkit.GTBSplitter
About: Split a single *.gtb file into multiple sub-files (e.g., split by chromosome or variant index).
Options:
  --chromosome  Specify the chromosome tags file. e.g., identify 'X, chrX, 
                CHRX, ChrX' as '(int) 22' chromosome.
                format: --chromosome <file>
  --threads,-t  Set the number of threads.
                default: 4
                format: --threads <int>
  --by          Split input file by chromosome-level or variant-level into 
                multiple sub-files, which can be rejoined by the concat mode.
                default: chromosome
                format: '--by chromosome [tag],[tag],...' or '--by variant 
                [int]' 
Subset Selection Options:
  --pos,-p           Retrieve the variants by the specified coordinates of 
                     variant. 
                     format: --pos <chr>:<pos>,<pos>,... ... (>= 1)
  --pos-range,-pr    Retrieve the variants by the specified coordinate 
                     intervals of variant.
                     format: --pos-range <chr>:<minPos>-<maxPos>,... (>= 1)
  --index-range,-ir  Retrieve the variants by the line-index of variant.
                     format: --index-range <minIndex>-<maxIndex> (>= 0)
  --allele-num       Exclude variants with the alternative allele number per 
                     variant out of the range [minAlleleNum, maxAlleleNum].
                     format: --allele-num <minAlleleNum>-<maxAlleleNum> (0 ~ 
                     255) 
  --seq-ac           Exclude variants with the alternate allele count (AC) per 
                     variant out of the range [minAc, maxAc].
                     format: --seq-ac <minAc>-<maxAc> (>= 0)
  --seq-af           Exclude variants with the alternate allele frequency (AF) 
                     per variant out of the range [minAf, maxAf].
                     format: --seq-af <minAf>-<maxAf> (0.0 ~ 1.0)
  --seq-an           Exclude variants with the non-missing allele number (AN) 
                     per variant out of the range [minAn, maxAn].
                     format: --seq-an <minAn>-<maxAn> (>= 0)
  --field-condition  Extract variants by the values of the specified 
                     supplementary fields. For comparable fields, the 
                     'condition' format is 'minValue-maxValue'; for other 
                     formats, the 'condition' is multiple optional values 
                     separated by ','.
                     format: --field-condition <field>=<condition> 
                     <field>=<condition> ...
Edit Meta Options:
  --add-meta  Add the specified metas to the output file.
              format: --add-meta <key>=<value> <key>=<value> ...
  --rm-meta   Remove all meta information.

API Toolkit

The API tool for splitting GTB files is edu.sysu.pmglab.gbc.GTBSplitter. Example usage is as follows:

GTBSplitter.of("https://pmglab.top/gbc/download/rare.disease.hg19.gtb")
        .splitByChromosome(null);
Copyright ©Liubin Zhang all right reservedLast modified time: 2023-04-10 13:13:41

results matching ""

    No results matching ""