Chromosome Tag Declaration
As opposed to most uses of chromosome names (in string format, e.g. "chr1") or chromosome indices (integers) as markers for chromosomes, GBC declares chromosomes via the global static class Chromosome. The flexible design of the chromosome tag declaration allows users to extend them according to their research needs, i.e., adding properties to chromosomes, adding chromosome aliases, and designing different chromosome name identification rules for different species.
GBC has a built-in chromosome tag declaration rule for the human genome (see the table below). Under this rule, all integers are identified as indices of the chromosome, from 0 to 24 associated with chr1 to chrUn in that order. In the identical line, any optional chromosome tags and its index are identified as the identical entity, represented using a chromosome object (Java memory object) with the same address.
In the chromosome tag declaration file, simpleName
and fullName
are mandatory fields and alternativeNames
is an optional field. Information about other properties at the chromosome level (e.g., length of the chromosome, whether it is a sex chromosome) can also be stored by adding new column fields using TAB delimiter. For other complementary field information, the properties are read in Java using chromosome.getProperty(String fieldName)
.
#simpleName | fullName | alternativeNames |
---|---|---|
1 | chr1 | Chr1,CHR1,CM000663.1,CM000663.2,NC_000001.10,NC_000001.11 |
2 | chr2 | Chr2,CHR2,CM000664.1,CM000664.2,NC_000002.11,NC_000002.12 |
3 | chr3 | Chr3,CHR3,CM000665.1,CM000665.2,NC_000003.11,NC_000003.12 |
4 | chr4 | Chr4,CHR4,CM000666.1,CM000666.2,NC_000004.11,NC_000004.12 |
5 | chr5 | Chr5,CHR5,CM000667.1,CM000667.2,NC_000005.9,NC_000005.10 |
6 | chr6 | Chr6,CHR6,CM000668.1,CM000668.2,NC_000006.11,NC_000006.12 |
7 | chr7 | Chr7,CHR7,CM000669.1,CM000669.2,NC_000007.13,NC_000007.14 |
8 | chr8 | Chr8,CHR8,CM000670.1,CM000670.2,NC_000008.10,NC_000008.11 |
9 | chr9 | Chr9,CHR9,CM000671.1,CM000671.2,NC_000009.11,NC_000009.12 |
10 | chr10 | Chr10,CHR10,CM000672.1,CM000672.2,NC_000010.10,NC_000010.11 |
11 | chr11 | Chr11,CHR11,CM000673.1,CM000673.2,NC_000011.9,NC_000011.10 |
12 | chr12 | Chr12,CHR12,CM000674.1,CM000674.2,NC_000012.11,NC_000012.12 |
13 | chr13 | Chr13,CHR13,CM000675.1,CM000675.2,NC_000013.10,NC_000013.11 |
14 | chr14 | Chr14,CHR14,CM000676.1,CM000676.2,NC_000014.8,NC_000014.9 |
15 | chr15 | Chr15,CHR15,CM000677.1,CM000677.2,NC_000015.9,NC_000015.10 |
16 | chr16 | Chr16,CHR16,CM000678.1,CM000678.2,NC_000016.9,NC_000016.10 |
17 | chr17 | Chr17,CHR17,CM000679.1,CM000679.2,NC_000017.10,NC_000017.11 |
18 | chr18 | Chr18,CHR18,CM000680.1,CM000680.2,NC_000018.9,NC_000018.10 |
19 | chr19 | Chr19,CHR19,CM000681.1,CM000681.2,NC_000019.9,NC_000019.10 |
20 | chr20 | Chr20,CHR20,CM000682.1,CM000682.2,NC_000020.10,NC_000020.11 |
21 | chr21 | Chr21,CHR21,CM000683.1,CM000683.2,NC_000021.8,NC_000021.9 |
22 | chr22 | Chr22,CHR22,CM000684.1,CM000684.2,NC_000022.10,NC_000022.11 |
X | chrX | ChrX,CHRX,x,chrx,Chrx,CHRx,CM000685.1,CM000685.2,NC_000023.10,NC_000023.11 |
Y | chrY | ChrY,CHRY,y,chry,Chry,CHRy,CM000686.1,CM000686.2,NC_000024.9,NC_000024.10 |
M | chrM | MT,chrMT,ChrM,CHRM,ChrMT,CHRMT,m,chrm,mt,chrmt,CHRm,Chrmt,NC_001807.4,J01415.2,NC_012920.1 |
Un | chrUn | ChrUn,CHRUn |
Create Chromosome Tag Declaration File
For non-human genomes, the GBC requires a different chromosome tag declaration file. As an example for dogs, download the VCF file from the dog10K website.
wget -c -O dogGenomeSnp.vcf.gz ftp://download.big.ac.cn/dogsd/dog10k/variations/58indiv.unifiedgenotyper.recalibrated_95.5_filtered.pass_snp.vcf.gz -t 0 -T 60
This VCF file contains chromosomes chr1~chr38 and chrX. Make the corresponding chromosome label declaration file text format and construct the binary format of the chromosome label declaration file using the following instructions:
# Download the dog's chromosome tag declaration file (text format)
wget https://pmglab.top/gbc/download/dog-chromosome.txt
# Build the chromosome tag declaration file, run it directly in the terminal
java -jar gbc.jar chromosome --build dog-chromosome.txt
# Build the chromosome tag declaration file, and run it with docker
docker run -v `pwd`:`pwd` -w `pwd` --rm -it -m 4g gbc \
chromosome --build dog-chromosome.txt
Build the GTB archive for this VCF file:
# Run directly in the terminal
java -jar gbc.jar chromosome vcf2gtb ./dogGenomeSnp.vcf.gz --chromosome ./dog-chromosome.txt.ccf
# Run it using docker
docker run -v `pwd`:`pwd` -w `pwd` --rm -it -m 4g gbc \
vcf2gtb ./dogGenomeSnp.vcf.gz --chromosome ./dog-chromosome.txt.ccf
Build a coordinate index for this GTB file:
# Run directly in the terminal
java -jar gbc.jar index ./dogGenomeSnp.gtb
# Run it using docker
docker run -v `pwd`:`pwd` -w `pwd` --rm -it -m 4g gbc \
index ./dogGenomeSnp.gtb
The terminal outputs the following message:
Program Options
Usage: chromosome [options]
About: Construct the chromosome tags file. The chromosome tags file is a
tab-separated text format with three columns. The first column
contains simple names of chromosomes (simpleName), the second column
contains full names of chromosomes (fullName), and the third column
contains alternative names (separated by comma). In the same row,
all names link to the same chromosome object.
Options:
--build Build the chromosome tags file (this file is in a binary format).
format: --build <file> (Exists,File)
--view View the chromosome tags file.
format: --view <file> (Exists,File)
---------------------Chromosome Tags File Format Example----------------------
#simpleName fullName alternativeNames
1 chr1 Chr1,CHR1,CM000663.1,CM000663.2,NC_000001.10,NC_000001.11
2 chr2 Chr2,CHR2,CM000664.1,CM000664.2,NC_000002.11,NC_000002.12
3 chr3 Chr3,CHR3,CM000665.1,CM000665.2,NC_000003.11,NC_000003.12
4 chr4 Chr4,CHR4,CM000666.1,CM000666.2,NC_000004.11,NC_000004.12
5 chr5 Chr5,CHR5,CM000667.1,CM000667.2,NC_000005.9,NC_000005.10
6 chr6 Chr6,CHR6,CM000668.1,CM000668.2,NC_000006.11,NC_000006.12
7 chr7 Chr7,CHR7,CM000669.1,CM000669.2,NC_000007.13,NC_000007.14
8 chr8 Chr8,CHR8,CM000670.1,CM000670.2,NC_000008.10,NC_000008.11
9 chr9 Chr9,CHR9,CM000671.1,CM000671.2,NC_000009.11,NC_000009.12
10 chr10 Chr10,CHR10,CM000672.1,CM000672.2,NC_000010.10,NC_000010.11
11 chr11 Chr11,CHR11,CM000673.1,CM000673.2,NC_000011.9,NC_000011.10
12 chr12 Chr12,CHR12,CM000674.1,CM000674.2,NC_000012.11,NC_000012.12
13 chr13 Chr13,CHR13,CM000675.1,CM000675.2,NC_000013.10,NC_000013.11
14 chr14 Chr14,CHR14,CM000676.1,CM000676.2,NC_000014.8,NC_000014.9
15 chr15 Chr15,CHR15,CM000677.1,CM000677.2,NC_000015.9,NC_000015.10
16 chr16 Chr16,CHR16,CM000678.1,CM000678.2,NC_000016.9,NC_000016.10
17 chr17 Chr17,CHR17,CM000679.1,CM000679.2,NC_000017.10,NC_000017.11
18 chr18 Chr18,CHR18,CM000680.1,CM000680.2,NC_000018.9,NC_000018.10
19 chr19 Chr19,CHR19,CM000681.1,CM000681.2,NC_000019.9,NC_000019.10
20 chr20 Chr20,CHR20,CM000682.1,CM000682.2,NC_000020.10,NC_000020.11
21 chr21 Chr21,CHR21,CM000683.1,CM000683.2,NC_000021.8,NC_000021.9
22 chr22 Chr22,CHR22,CM000684.1,CM000684.2,NC_000022.10,NC_000022.11
X chrX ChrX,CHRX,x,chrx,Chrx,CHRx,CM000685.1,CM000685.2,NC_000023.10,NC_000023.11
Y chrY ChrY,CHRY,y,chry,Chry,CHRy,CM000686.1,CM000686.2,NC_000024.9,NC_000024.10
M chrM MT,chrMT,ChrM,CHRM,ChrMT,CHRMT,m,chrm,mt,chrmt,CHRm,Chrmt,NC_001807.4,J01415.2,NC_012920.1
Un chrUn ChrUn,CHRUn
API Toolkit
The API tool for managing chromosome tags is the edu.sysu.pmglab.gbc.variant.Chromosome, which supports direct create and modify the chromosome tags and their properties.
// Clear the original chromosome tags
Chromosome.clear();
// Create a new chromosome tag
Chromosome chromosome1 = Chromosome.addChromosome(new Chromosome("1", "chr1"));
// Set chromosome's properties
chromosome1.setProperty("length", 249250621);
// get chromosome's properties
chromosome1.getProperty("length");