Sort GTB by Coordinates
Typically GTB files are ordered by coordinates, however, but when the liftover occurs or sorted by certain annotated fields (e.g., by pathogenic potential of variants), it may cause the GTB to become coordinate-disordered. Use the following command to sort GTB files by coordinates:
java -jar gbc.jar sort <input> [output] [options]
output is not set, the output file will overwrite the original file. If the input file is a remote site file, the output file is saved under the current local working path.
The "ordered" defined by the GTB is weakly-ordered, i.e., the variants with the same chromosome must be ordered and stored continuously. Ordered GTBs or VCFs are mandatory in many algorithm designs. For example, when calculating LD coefficients, an unordered GTB or VCF file will take a lot of time to capture the variants within the window.
Use GBC to sort the example file
https://pmglab.top/gbc/download/assoc.unorder.hg38.gtbby the coordinates of the variants (this file is liftovered from hg19 to hg38 without sorting by the coordinates):
# Run directly in the terminal java -jar gbc.jar sort https://pmglab.top/gbc/download/assoc.unorder.hg38.gtb # Run it using docker docker run -v `pwd`:`pwd` -w `pwd` --rm -it -m 4g gbc \ sort https://pmglab.top/gbc/download/assoc.unorder.hg38.gtb
Usage: sort <input> [output] [options] Java-API: edu.sysu.pmglab.gbc.toolkit.GTBSorter About: Sort the variants in *.gtb by coordinate fields (CHROM, POS). Options: --chromosome Specify the chromosome tags file. e.g., identify 'X, chrX, CHRX, ChrX' as '(int) 22' chromosome. format: --chromosome <string> --threads,-t Set the number of threads. default: 4 format: --threads <int>
The API tool for sorting GTB files is edu.sysu.pmglab.gbc.GTBSorter, and an example of its use is as follows: