Merge Multiple GTB

Merge means creating a superset of variant calls across multiple individuals (e.g., merging variants of different populations in 1000GP3), also involves merging fields with the same coordinates with each other. Since the merging of multiple files is quite complex on non-genotypes (e.g., updating of certain statistical fields, merging of annotation information), the command line tool GBC provides only genotype merging, using the following command for genotype merging of GTB files:

java -jar gbc.jar merge <input> <input> ... -o <output> [options]

The merged file can be split by the GTBExporter. Merging of multiple files (> 2) in GBC uses the two-by-two merging strategy (optimized using a minimum heap weighted by the sample size) to accommodate merging between sample sets of arbitrary size.

mergeGTB

[!NOTE|label:Example|style:callout]

Merging all Y-chromosome genotypes of 1000GP3 using GBC:

# Download the data file
wget https://pmglab.top/gbc/download/1000GP3.hg19.chrY/afr.gtb \
     https://pmglab.top/gbc/download/1000GP3.hg19.chrY/amr.gtb \
     https://pmglab.top/gbc/download/1000GP3.hg19.chrY/eas.gtb \
     https://pmglab.top/gbc/download/1000GP3.hg19.chrY/eur.gtb \
     https://pmglab.top/gbc/download/1000GP3.hg19.chrY/sas.gtb

# Run directly in the terminal
java -jar gbc.jar merge ./afr.gtb ./amr.gtb ./eas.gtb ./eur.gtb ./sas.gtb \
                        -o ./1000GP3.chrY.gtb

# Run it using docker
docker run -v `pwd`:`pwd` -w `pwd` --rm -it -m 4g gbc \
merge ./afr.gtb ./amr.gtb ./eas.gtb ./eur.gtb ./sas.gtb \
      -o ./1000GP3.chrY.gtb

Program Options

Usage: merge <input> <input> ... -o <output> [options]
Java-API: edu.sysu.pmglab.gbc.toolkit.GTBMerger
About: Merge genotypes of individuals in multiple *.gtb into a single *.gtb. 
       Merge means creating a superset of variant calls across multiple
       individuals.
Options:
  *--output,-o  Set the output file.
                format: --output <file>
  --chromosome  Specify the chromosome tags file. e.g., identify 'X, chrX, 
                CHRX, ChrX' as '(int) 22' chromosome.
                format: --chromosome <string>
  --threads,-t  Set the number of threads.
                default: 4
                format: --threads <int> (>= 1)
  --method,-m   Method for handing coordinates in different files (union, 
                intersection or alignment), the missing genotype is replaced by 
                '.'. 
                default: alignment
                format: --method <string> ([union/intersection/alignment] 
                (ignoreCase)) 
  --no-gt       Do not load and store genotypes.
  --add-meta    Add the specified metas to the output file.
                format: --add-meta <key>=<value> <key>=<value> ...
  --rm-meta     Remove all meta information.

API Toolkit

The API tool for merging GTB files is edu.sysu.pmglab.gbc.GTBMerger, and an example of its use is as follows:

GTBMerger.of("./afr.gtb", "./amr.gtb")
        .setMergeOperator(GTBMerger.MergeOperator.UNION)
        .submit();

In the GTBMerger, merging of non-genotypic fields is achieved by adding additional field names and field types with addField and setting new field values with addValueConverter.

Copyright ©Liubin Zhang all right reservedLast modified time: 2023-04-10 13:11:41

results matching ""

    No results matching ""