Chromosome Tag Declaration

As opposed to most uses of chromosome names (in string format, e.g. "chr1") or chromosome indices (integers) as markers for chromosomes, GBC declares chromosomes via the global static class Chromosome. The flexible design of the chromosome tag declaration allows users to extend them according to their research needs, i.e., adding properties to chromosomes, adding chromosome aliases, and designing different chromosome name identification rules for different species.

GBC has a built-in chromosome tag declaration rule for the human genome (see the table below). Under this rule, all integers are identified as indices of the chromosome, from 0 to 24 associated with chr1 to chrUn in that order. In the identical line, any optional chromosome tags and its index are identified as the identical entity, represented using a chromosome object (Java memory object) with the same address.

In the chromosome tag declaration file, simpleName and fullName are mandatory fields and alternativeNames is an optional field. Information about other properties at the chromosome level (e.g., length of the chromosome, whether it is a sex chromosome) can also be stored by adding new column fields using TAB delimiter. For other complementary field information, the properties are read in Java using chromosome.getProperty(String fieldName).

#simpleName fullName alternativeNames
1 chr1 Chr1,CHR1,CM000663.1,CM000663.2,NC_000001.10,NC_000001.11
2 chr2 Chr2,CHR2,CM000664.1,CM000664.2,NC_000002.11,NC_000002.12
3 chr3 Chr3,CHR3,CM000665.1,CM000665.2,NC_000003.11,NC_000003.12
4 chr4 Chr4,CHR4,CM000666.1,CM000666.2,NC_000004.11,NC_000004.12
5 chr5 Chr5,CHR5,CM000667.1,CM000667.2,NC_000005.9,NC_000005.10
6 chr6 Chr6,CHR6,CM000668.1,CM000668.2,NC_000006.11,NC_000006.12
7 chr7 Chr7,CHR7,CM000669.1,CM000669.2,NC_000007.13,NC_000007.14
8 chr8 Chr8,CHR8,CM000670.1,CM000670.2,NC_000008.10,NC_000008.11
9 chr9 Chr9,CHR9,CM000671.1,CM000671.2,NC_000009.11,NC_000009.12
10 chr10 Chr10,CHR10,CM000672.1,CM000672.2,NC_000010.10,NC_000010.11
11 chr11 Chr11,CHR11,CM000673.1,CM000673.2,NC_000011.9,NC_000011.10
12 chr12 Chr12,CHR12,CM000674.1,CM000674.2,NC_000012.11,NC_000012.12
13 chr13 Chr13,CHR13,CM000675.1,CM000675.2,NC_000013.10,NC_000013.11
14 chr14 Chr14,CHR14,CM000676.1,CM000676.2,NC_000014.8,NC_000014.9
15 chr15 Chr15,CHR15,CM000677.1,CM000677.2,NC_000015.9,NC_000015.10
16 chr16 Chr16,CHR16,CM000678.1,CM000678.2,NC_000016.9,NC_000016.10
17 chr17 Chr17,CHR17,CM000679.1,CM000679.2,NC_000017.10,NC_000017.11
18 chr18 Chr18,CHR18,CM000680.1,CM000680.2,NC_000018.9,NC_000018.10
19 chr19 Chr19,CHR19,CM000681.1,CM000681.2,NC_000019.9,NC_000019.10
20 chr20 Chr20,CHR20,CM000682.1,CM000682.2,NC_000020.10,NC_000020.11
21 chr21 Chr21,CHR21,CM000683.1,CM000683.2,NC_000021.8,NC_000021.9
22 chr22 Chr22,CHR22,CM000684.1,CM000684.2,NC_000022.10,NC_000022.11
X chrX ChrX,CHRX,x,chrx,Chrx,CHRx,CM000685.1,CM000685.2,NC_000023.10,NC_000023.11
Y chrY ChrY,CHRY,y,chry,Chry,CHRy,CM000686.1,CM000686.2,NC_000024.9,NC_000024.10
M chrM MT,chrMT,ChrM,CHRM,ChrMT,CHRMT,m,chrm,mt,chrmt,CHRm,Chrmt,NC_001807.4,J01415.2,NC_012920.1
Un chrUn ChrUn,CHRUn

Create Chromosome Tag Declaration File

For non-human genomes, the GBC requires a different chromosome tag declaration file. As an example for dogs, download the VCF file from the dog10K website.

wget -c -O dogGenomeSnp.vcf.gz ftp://download.big.ac.cn/dogsd/dog10k/variations/58indiv.unifiedgenotyper.recalibrated_95.5_filtered.pass_snp.vcf.gz -t 0 -T 60

This VCF file contains chromosomes chr1~chr38 and chrX. Make the corresponding chromosome label declaration file text format and construct the binary format of the chromosome label declaration file using the following instructions:

# Download the dog's chromosome tag declaration file (text format)
wget https://pmglab.top/gbc/download/dog-chromosome.txt

# Build the chromosome tag declaration file, run it directly in the terminal
java -jar gbc.jar chromosome --build dog-chromosome.txt

# Build the chromosome tag declaration file, and run it with docker
docker run -v `pwd`:`pwd` -w `pwd` --rm -it -m 4g gbc \
chromosome --build dog-chromosome.txt

Build the GTB archive for this VCF file:

# Run directly in the terminal
java -jar gbc.jar chromosome vcf2gtb ./dogGenomeSnp.vcf.gz --chromosome ./dog-chromosome.txt.ccf

# Run it using docker
docker run -v `pwd`:`pwd` -w `pwd` --rm -it -m 4g gbc \
vcf2gtb ./dogGenomeSnp.vcf.gz --chromosome ./dog-chromosome.txt.ccf

Build a coordinate index for this GTB file:

# Run directly in the terminal
java -jar gbc.jar index ./dogGenomeSnp.gtb

# Run it using docker
docker run -v `pwd`:`pwd` -w `pwd` --rm -it -m 4g gbc \
index ./dogGenomeSnp.gtb

The terminal outputs the following message:

dogGenomeSnpIndexer

Program Options

Usage: chromosome [options]
About: Construct the chromosome tags file. The chromosome tags file is a 
       tab-separated text format with three columns. The first column 
       contains simple names of chromosomes (simpleName), the second column 
       contains full names of chromosomes (fullName), and the third column 
       contains alternative names (separated by comma). In the same row, 
       all names link to the same chromosome object.
Options:
  --build  Build the chromosome tags file (this file is in a binary format).
           format: --build <file> (Exists,File)
  --view   View the chromosome tags file.
           format: --view <file> (Exists,File)

---------------------Chromosome Tags File Format Example----------------------
#simpleName    fullName    alternativeNames
1    chr1    Chr1,CHR1,CM000663.1,CM000663.2,NC_000001.10,NC_000001.11
2    chr2    Chr2,CHR2,CM000664.1,CM000664.2,NC_000002.11,NC_000002.12
3    chr3    Chr3,CHR3,CM000665.1,CM000665.2,NC_000003.11,NC_000003.12
4    chr4    Chr4,CHR4,CM000666.1,CM000666.2,NC_000004.11,NC_000004.12
5    chr5    Chr5,CHR5,CM000667.1,CM000667.2,NC_000005.9,NC_000005.10
6    chr6    Chr6,CHR6,CM000668.1,CM000668.2,NC_000006.11,NC_000006.12
7    chr7    Chr7,CHR7,CM000669.1,CM000669.2,NC_000007.13,NC_000007.14
8    chr8    Chr8,CHR8,CM000670.1,CM000670.2,NC_000008.10,NC_000008.11
9    chr9    Chr9,CHR9,CM000671.1,CM000671.2,NC_000009.11,NC_000009.12
10    chr10    Chr10,CHR10,CM000672.1,CM000672.2,NC_000010.10,NC_000010.11
11    chr11    Chr11,CHR11,CM000673.1,CM000673.2,NC_000011.9,NC_000011.10
12    chr12    Chr12,CHR12,CM000674.1,CM000674.2,NC_000012.11,NC_000012.12
13    chr13    Chr13,CHR13,CM000675.1,CM000675.2,NC_000013.10,NC_000013.11
14    chr14    Chr14,CHR14,CM000676.1,CM000676.2,NC_000014.8,NC_000014.9
15    chr15    Chr15,CHR15,CM000677.1,CM000677.2,NC_000015.9,NC_000015.10
16    chr16    Chr16,CHR16,CM000678.1,CM000678.2,NC_000016.9,NC_000016.10
17    chr17    Chr17,CHR17,CM000679.1,CM000679.2,NC_000017.10,NC_000017.11
18    chr18    Chr18,CHR18,CM000680.1,CM000680.2,NC_000018.9,NC_000018.10
19    chr19    Chr19,CHR19,CM000681.1,CM000681.2,NC_000019.9,NC_000019.10
20    chr20    Chr20,CHR20,CM000682.1,CM000682.2,NC_000020.10,NC_000020.11
21    chr21    Chr21,CHR21,CM000683.1,CM000683.2,NC_000021.8,NC_000021.9
22    chr22    Chr22,CHR22,CM000684.1,CM000684.2,NC_000022.10,NC_000022.11
X    chrX    ChrX,CHRX,x,chrx,Chrx,CHRx,CM000685.1,CM000685.2,NC_000023.10,NC_000023.11
Y    chrY    ChrY,CHRY,y,chry,Chry,CHRy,CM000686.1,CM000686.2,NC_000024.9,NC_000024.10
M    chrM    MT,chrMT,ChrM,CHRM,ChrMT,CHRMT,m,chrm,mt,chrmt,CHRm,Chrmt,NC_001807.4,J01415.2,NC_012920.1
Un    chrUn    ChrUn,CHRUn

API Toolkit

The API tool for managing chromosome tags is the edu.sysu.pmglab.gbc.variant.Chromosome, which supports direct create and modify the chromosome tags and their properties.

// Clear the original chromosome tags
Chromosome.clear();

// Create a new chromosome tag
Chromosome chromosome1 = Chromosome.addChromosome(new Chromosome("1", "chr1"));

// Set chromosome's properties
chromosome1.setProperty("length", 249250621);

// get chromosome's properties
chromosome1.getProperty("length");
Copyright ©Liubin Zhang all right reservedLast modified time: 2023-04-10 18:11:45

results matching ""

    No results matching ""