Export as BED Format
PLINK is incapable of concurrently converting VCF files to BED format. On the command line, employ the following instruction to export GTB file to BED format:
java -jar gbc.jar gtb2bed <input> [output] [options]
When output
is not specified, the output files will automatically generate <input>.bed
, <input>.bim
and <input>.fam
files according to the input file <input>.gtb
. If the input file is a remote file, the output files will be stored in the current local working path.
[!NOTE|label:Example|style:callout]
Here is the command to use GBC to output
https://pmglab.top/gbc/download/assoc.hg19.gtb
in BED format:# Run directly in the terminal java -jar gbc.jar gtb2bed https://pmglab.top/gbc/download/assoc.hg19.gtb # Run it using docker docker run -v `pwd`:`pwd` -w `pwd` --rm -it -m 4g gbc \ gtb2bed https://pmglab.top/gbc/download/assoc.hg19.gtb
Please note that the rules for converting from GTB to BED files differ from those PLINK employs when converting VCF to BED:
- GTB faithfully designate REF as A1 from the
.bim
file instead of assigning it based upon allele frequency. - For multiallelic variants, PLINK retains only the two genotypes with the highest allele frequencies, designating them as A2 and A1, respectively. In contrast, GBC splits multiallelic sites into multiple biallelic sites, designating REF as A1 and ALT as A2 for each. If the current GTB file was constructed from a BED file (using bed2gtb), it will utilize the A1 and A2 from the BED file as REF and ALT, respectively, thereby ensuring the output matches PLINK's BED format.
Program Options
Usage: gtb2bed <input> [output] [options]
Java-API: edu.sysu.pmglab.gbc.toolkit.bed.BEDWriter
About: Decompress and export variants from *.gtb (genotype block format) to
*.bed (PLINK binary biallelic genotype table).
Options:
--chromosome Specify the chromosome tags file. e.g., identify 'X, chrX,
CHRX, ChrX' as '(int) 22' chromosome.
format: --chromosome <string>
--threads,-t Set the number of threads.
default: 4
format: --threads <int> (1 ~ 10)
Subset Selection Options:
--subject,-s Retrieve the genotypes of the specified subject (by
subject names). Subject names can be stored in a file
with comma-separated format, and pass in via '-s @file'.
format: --subject <string>,<string>,...
--subject-range,-sr Retrieve the genotypes of the specified subject (by
intervals of subject index).
format: --subject-range <minIndex>-<maxIndex> (>= 0)
--subject-index,-si Retrieve the genotypes of the specified subject (by
subject indexes).
format: --subject-index <index1>,<index2>,... (>= 0)
--pos,-p Retrieve the variants by the specified coordinates of
variant.
format: --pos <chr>:<pos>,<pos>,... ... (>= 1)
--pos-range,-pr Retrieve the variants by the specified coordinate
intervals of variant.
format: --pos-range <chr>:<minPos>-<maxPos>,... (>= 1)
--index-range,-ir Retrieve the variants by the line-index of variant.
format: --index-range <minIndex>-<maxIndex> (>= 0)
Quality Control Options:
--allele-num Exclude variants with the alternative allele number per
variant out of the range [minAlleleNum, maxAlleleNum].
format: --allele-num <minAlleleNum>-<maxAlleleNum> (0 ~
255)
--seq-ac Exclude variants with the alternate allele count (AC) per
variant out of the range [minAc, maxAc].
format: --seq-ac <minAc>-<maxAc> (>= 0)
--seq-af Exclude variants with the alternate allele frequency (AF)
per variant out of the range [minAf, maxAf].
format: --seq-af <minAf>-<maxAf> (0.0 ~ 1.0)
--seq-an Exclude variants with the non-missing allele number (AN)
per variant out of the range [minAn, maxAn].
format: --seq-an <minAn>-<maxAn> (>= 0)
--field-condition Extract variants by the values of the specified
supplementary fields. For comparable fields, the
'condition' format is 'minValue-maxValue'; for other
formats, the 'condition' is multiple optional values
separated by ','.
format: --field-condition <field>=<condition>
<field>=<condition> ...
API Toolkit
The read/write support for the BED format is located in the package edu.sysu.pmglab.gbc.toolkit.bed, implemented by BEDGenotypes for mapping between BED genotypes and GTB genotypes. The tools for BED file creation are BEDWriter and BEDPartWriter, both of which can create BED files. Since GTB support splitting the entire file into equal parts by variant count for parallel processing, we have also implemented this functionality for BED (i.e. BEDPartWriter).The .fam file can be generated using BEDWriter.generateFam(Individual[] individuals, String fileName)
or BEDWriter.generateFam(String[] individualNames, String fileName)
.