Build GTB for BED

PLINK is one of the most popular toolsets for genome-wide association analysis today, focusing on the analysis of genotype and phenotype data. The PED and BED formats proposed by PLINK are major file formats for storing genotype data, aimed at accelerating large-scale association analysis and statistical applications. GBC implements the conversion between the SNP-major PLINK-BED format and the GTB format. GBC follows the basic specifications of the BED file format. Its input actually contains three files: <input>.bed,<input>.bim,<input>.fam. The following command can be used on the command line to construct a GTB archive from PLINK-BED file:

java -jar gbc.jar bed2gtb <input> [output] [options]

PLINK focuses on the computation of genotype, with poor storage and access performance for genotype data. GBC focuses on fast access, efficient storage, integrated management of genotype data, and enables fast bidirectional conversion between GTB and BED formats. Therefore, GBC is also very suitable for expanding the functionality of the BED format.


First, use PLINK to build an archive for the example file

# Download the data file
wget -O assoc.hg19.vcf.gz

# convert VCF to BED via PLINK
plink --vcf ./assoc.hg19.vcf.gz --make-bed --out ./assoc.hg19

Utilize GBC to convert the BED file to GTB format, setting the reference genome version to hg38:

# Run directly in the terminal
java -jar gbc.jar bed2gtb ./assoc.hg19 ./assoc.hg38.gtb \
                    --liftover hg19ToHg38

# Run it using docker
docker run -v `pwd`:`pwd` -w `pwd` --rm -it -m 4g gbc \
bed2gtb ./assoc.hg19 ./assoc.hg38.gtb \
  --liftover hg19ToHg38

Program Options

Usage: bed2gtb <input> [output] [options]
Java-API: edu.sysu.pmglab.gbc.toolkit.bed.BEDReader
About: Compress and build *.gtb (genotype block format) for *.bed (PLINK binary
       biallelic genotype table).
  --chromosome       Specify the chromosome tags file. e.g., identify 'X, chrX, 
                     CHRX, ChrX' as '(int) 22' chromosome.
                     format: --chromosome <file>
  --threads,-t       Set the number of threads.
                     default: 4
                     format: --threads <int> (1 ~ 10)
  --add-meta         Add the specified metas to the output file.
                     format: --add-meta <key>=<value> <key>=<value> ...
  --liftover         Lift over variants from one reference genome version to 
                     another (chain files are downloaded from 
                     format: --liftover <string> 
  --index-range,-ir  Retrieve the variants by the line-index of variant.
                     format: --index-range <minIndex>-<maxIndex> (>= 0)

API Toolkit

The toolkit furnishing read and write support for the BED format resides within the package edu.sysu.pmglab.gbc.toolkit.bed, implemented by BEDGenotypes to achieve the mutual mapping between the BED genotype and GTB genotype. The tools employed for reading the BED file are BEDReader and SeekableBEDReader. The former is for directly reading the BED file, whilst the latter is designed to enable parallel reading and coordinate-based fast retrieval for the BED file.

When SeekableBEDReader is in operation, the program automatically constructs the .bim file as a GTB file internally, enabling the BED file to also support some of the fast access functions of GBC. The following is an example of using SeekableBEDReader to read a BED file:

SeekableBEDReader bedReader = new SeekableBEDReader("");  

// Jump to the 100th line;   

// Print the information of this locus
Copyright ©Liubin Zhang all right reservedLast modified time: 2023-04-18 21:45:54

results matching ""

    No results matching ""