LD Calculator
GBC integrates a fast LD calculation method based on GTB. The LD coefficients between variants are calculated using the following commands:
java -jar gbc.jar ld <input> [output] [options]
When no output file is set, GBC will compress the output file using bgzip (level: 5) by default to minimize the size of the output file. GBC can use parallelization to increase the speed when the input file contains multiple chromosomes (e.g., a single file with genotypes of the entire genome in the input), otherwise parallelization only applies when the final export is in TXT or BGZIP format.
Note that LD calculations are only available for coordinate-ordered GTBs, for coordinate unordered GTBs, please first use GTBSorter for sorting.
[!NOTE|label:示例程序|style:callout]
使用 GBC-LDCalculator 计算 1000GP3-EAS-chr4 的 LD 系数:
# Download the data file wget https://pmglab.top/gbc/download/1kg.phase3.v5.shapeit2.eas.hg19.chr4.gtb # Run directly in the terminal java -jar gbc.jar ld 1kg.phase3.v5.shapeit2.eas.hg19.chr4.gtb # Run it using docker docker run -v `pwd`:`pwd` -w `pwd` --rm -it -m 4g gbc \ ld 1kg.phase3.v5.shapeit2.eas.hg19.chr4.gtb
Program Options
Usage: ld <input> [output] [options]
Java-API: edu.sysu.pmglab.gbc.linkagedisequilibrium.LDCalculator
About: Calculate pairwise the linkage disequilibrium or genotypic correlation.
The GBC-LDCalculator performs linkage disequilibrium calculations for
biallelic variants, which is a common processing strategy. For
multi-allelic variants, GBC-LDCalculator considers all ALTs as one
allele (i.e., non-REF alleles) for calculation. For multiple variants
with the same coordinates, GBC selects the variant with the maximum
MAF for calculation and discards the others.
Options:
--chromosome Specify the chromosome tags file. e.g., identify 'X, chrX,
CHRX, ChrX' as '(int) 22' chromosome.
format: --chromosome <file>
--threads,-t Set the number of threads.
default: 4
format: --threads <int> (>= 1)
LD Calculation Options:
--hap-ld Calculate pairwise the linkage disequilibrium.
--geno-ld Calculate pairwise the genotypic correlation.
--window-bp,-bp The maximum number of physical bases between the variants
being calculated for LD.
default: 10000
format: --window-bp <int> (>= 10)
--min-r2 Exclude pairs with R2 values less than --min-r2.
default: 0.2
format: --min-r2 <float> (0.0 ~ 1.0)
--maf Exclude variants with the minor allele frequency (MAF) per
variant < maf.
default: 0.05
format: --maf <float> (1.0E-6 ~ 0.5)
--range,-r Calculate the LD by specified position range.
format: --range <chromosome>:<minPos>-<maxPos> (>= 1)
API Toolkit
The API tool for performing LD calculations for GTB files is edu.sysu.pmglab.gbc.linkedisequilibrium.LDCalculator, and the two LD calculation methods are implemented in HaplotypeLD and GenotypeLD, example of usage is as follows:
GTBReader reader = new GTBReader("https://pmglab.top/gbc/download/assoc.hg19.gtb");
Variant variant1 = reader.read();
Variant variant2 = reader.read();
IRecord record = variant1.calculateLD(variant2, GenotypeLD.INSTANCE);
reader.close();