Export as TSV Format

The following command is used to export the GTB file to TSV format in the command line:

java -jar gbc.jar gtb2tsv <input> [output] [options]
  • When output is not set, the output is exported to the terminal using standard output.
  • When output ends with .gz or .bgz (as extension), the output file is compressed using bgzip (level: 5), otherwise the output is in text format.

GBC provides a customizable TSV output format for easy converting GTB format to various text formats for organizing records by row. When using this feature, the output does not contain genotypes, but does contain allele count (AC), effective allele count (AN), and allele frequency (AF). To customize the output format, use --field to specify the output fields and --rename-field to rename the fields.

[!NOTE|label:Example|style:callout]

Use GBC to export the example file https://pmglab.top/gbc/download/assoc.hg19.gtb locally and set the following parameters:

  • Exclude variants with the alternate allele frequency (AF) per variant out of the range [0.05, 0.95].
  • Exclude variants with the alternative allele number < 3 per variant.

The command line instructions to complete the task are as follows:

# Run directly in the terminal
java -jar gbc.jar gtb2tsv https://pmglab.top/gbc/download/assoc.hg19.gtb \
                          --seq-af 0.05-0.95 --allele-num 3-

# Run it using docker
docker run -v `pwd`:`pwd` -w `pwd` --rm -it -m 4g gbc \
gtb2tsv https://pmglab.top/gbc/download/assoc.hg19.gtb \
        --seq-af 0.05-0.95 --allele-num 3-

Program Options

Usage: gtb2tsv <input> [output] [options]
Java-API: edu.sysu.pmglab.gbc.toolkit.TSVExporter
About: Decompress and export records from *.gtb (genotype block format) to 
       *.vcf (variant call format) or *.vcf.gz.
Options:
  --chromosome  Specify the chromosome tags file. e.g., identify 'X, chrX, 
                CHRX, ChrX' as '(int) 22' chromosome.
                format: --chromosome <string>
  --threads,-t  Set the number of threads.
                default: 4
                format: --threads <int>
Subset Selection Options:
  --field,-f         Select the specified fields from the *.gtb file to the 
                     output file (all fields by default).
                     format: --field <string>,<string>,...
  --no-gt            Do not load genotypes.
  --pos,-p           Retrieve the records by the specified coordinates of 
                     variant. 
                     format: --pos <chr>:<pos>,<pos>,... ... (>= 1)
  --pos-range,-pr    Retrieve the records by the specified coordinate intervals 
                     of variant.
                     format: --pos-range <chr>:<minPos>-<maxPos>,... (>= 1)
  --index-range,-ir  Retrieve the records by the line-index of variant.
                     format: --index-range <minIndex>-<maxIndex> (>= 0)
  --rename-field     Reset field names for *.gtb directly.
                     format: --rename-field <old>=<new> ...
Edit Meta Options:
  --add-meta           Add the specified metas to the output file.
                       format: --add-meta <key>=<value> <key>=<value> ...
  --rm-meta            Remove all meta information.
  --rm-duplicate-meta  Remove duplicate meta information.
Quality Control Options:
  --allele-num       Exclude records with the alternative allele number per 
                     variant out of the range [minAlleleNum, maxAlleleNum].
                     format: --allele-num <minAlleleNum>-<maxAlleleNum> (0 ~ 
                     255) 
  --seq-qual         Exclude records with the minimal overall sequencing 
                     quality score (Phred Quality Score) per variant < minQual.
                     format: --seq-qual <minQual> (>= 0.0)
  --seq-fs           Exclude records with the overall strand bias Phred-scaled 
                     p-value (using Fisher's exact test) per variant > maxFs.
                     format: --seq-fs <maxFs> (>= 0.0)
  --seq-mq           Exclude records with the minimal overall mapping quality 
                     score (Mapping Quality Score) per variant < minMq.
                     format: --seq-mq <minMq> (>= 0.0)
  --seq-info         Exclude records with the information (i.e., INFO in VCF) 
                     field contain or do not contain (starts with ^) the 
                     specified strings.
                     format: --seq-info <string> <string> ...
  --seq-ac           Exclude records with the alternate allele count (AC) per 
                     variant out of the range [minAc, maxAc].
                     format: --seq-ac <minAc>-<maxAc> (>= 0)
  --seq-af           Exclude records with the alternate allele frequency (AF) 
                     per variant out of the range [minAf, maxAf].
                     format: --seq-af <minAf>-<maxAf> (0.0 ~ 1.0)
  --seq-an           Exclude records with the non-missing allele number (AN) 
                     per variant out of the range [minAn, maxAn].
                     format: --seq-an <minAn>-<maxAn> (>= 0)
  --field-condition  Extract records by the values of the specified 
                     supplementary fields. For comparable fields, the 
                     'condition' format is 'minValue-maxValue'; for other 
                     formats, the 'condition' is multiple optional values 
                     separated by ','.
                     format: --field-condition <field>=<condition> 
                     <field>=<condition> ...

API Toolkit

The API tool for converting GTB files to TSV files is edu.sysu.pmglab.gbc.TSVExporter, example of usage is as follows:

TSVExporter.of("https://pmglab.top/gbc/download/assoc.hg19.gtb")
        .addVariantFilter(new GTBFilter().filterByAF(new Interval<>(0.05f, 0.95f)).filterByAlleleNum(new Interval<>(3, null)))
        .submit();
Copyright ©Liubin Zhang all right reservedLast modified time: 2023-04-10 18:26:18

results matching ""

    No results matching ""