LD Calculator

GBC integrates a fast LD calculation method based on GTB. The LD coefficients between variants are calculated using the following commands:

java -jar gbc.jar ld <input> [output] [options]

When no output file is set, GBC will compress the output file using bgzip (level: 5) by default to minimize the size of the output file. GBC can use parallelization to increase the speed when the input file contains multiple chromosomes (e.g., a single file with genotypes of the entire genome in the input), otherwise parallelization only applies when the final export is in TXT or BGZIP format.

Note that LD calculations are only available for coordinate-ordered GTBs, for coordinate unordered GTBs, please first use GTBSorter for sorting.


使用 GBC-LDCalculator 计算 1000GP3-EAS-chr4 的 LD 系数:

# Download the data file
wget https://pmglab.top/gbc/download/1kg.phase3.v5.shapeit2.eas.hg19.chr4.gtb

# Run directly in the terminal
java -jar gbc.jar ld 1kg.phase3.v5.shapeit2.eas.hg19.chr4.gtb 

# Run it using docker
docker run -v `pwd`:`pwd` -w `pwd` --rm -it -m 4g gbc \
ld 1kg.phase3.v5.shapeit2.eas.hg19.chr4.gtb

Program Options

Usage: ld <input> [output] [options]
Java-API: edu.sysu.pmglab.gbc.linkagedisequilibrium.LDCalculator
About: Calculate pairwise the linkage disequilibrium or genotypic correlation. 
       The GBC-LDCalculator performs linkage disequilibrium calculations for 
       biallelic variants, which is a common processing strategy. For 
       multi-allelic variants, GBC-LDCalculator considers all ALTs as one 
       allele (i.e., non-REF alleles) for calculation. For multiple variants 
       with the same coordinates, GBC selects the variant with the maximum
       MAF for calculation and discards the others.
  --chromosome  Specify the chromosome tags file. e.g., identify 'X, chrX, 
                CHRX, ChrX' as '(int) 22' chromosome.
                format: --chromosome <file>
  --threads,-t  Set the number of threads.
                default: 4
                format: --threads <int> (>= 1)
LD Calculation Options:
  --hap-ld         Calculate pairwise the linkage disequilibrium.
  --geno-ld        Calculate pairwise the genotypic correlation.
  --window-bp,-bp  The maximum number of physical bases between the variants 
                   being calculated for LD.
                   default: 10000
                   format: --window-bp <int> (>= 10)
  --min-r2         Exclude pairs with R2 values less than --min-r2.
                   default: 0.2
                   format: --min-r2 <float> (0.0 ~ 1.0)
  --maf            Exclude variants with the minor allele frequency (MAF) per 
                   variant < maf.
                   default: 0.05
                   format: --maf <float> (1.0E-6 ~ 0.5)
  --range,-r       Calculate the LD by specified position range.
                   format: --range <chromosome>:<minPos>-<maxPos> (>= 1)

API Toolkit

The API tool for performing LD calculations for GTB files is edu.sysu.pmglab.gbc.linkedisequilibrium.LDCalculator, and the two LD calculation methods are implemented in HaplotypeLD and GenotypeLD, example of usage is as follows:

GTBReader reader = new GTBReader("https://pmglab.top/gbc/download/assoc.hg19.gtb");
Variant variant1 = reader.read();
Variant variant2 = reader.read();
IRecord record = variant1.calculateLD(variant2, GenotypeLD.INSTANCE);
Copyright ©Liubin Zhang all right reservedLast modified time: 2023-04-10 14:04:23

results matching ""

    No results matching ""