Starting GBC 中文

GBC is now part of the KGGA toolkit. It specializes in genotype data processing—including memory encoding, storage encoding, and computational encoding to optimize performance for specific tasks—and the design of efficient coordinate search algorithms. To launch GBC’s command-line interface, use the following command:

java -jar kgga.jar gbc

You can access detailed documentation for all commands by adding the --help flag.

Core Functionality: The convert Command

GBC’s primary command is convert, which enables parallel conversion between common genomic analysis file formats. It also supports features like filtering, LiftOver, Biallelic conversion, quality control (QC), sorting, and concatenation.

Syntax

java -jar kgga.jar gbc convert <source>2<target> [options]

Examples: vcf2gtb, plink2gtb, gtb2vcf

Note: Conversion to PLINK-PGEN requires additional extensions (see "Extended Functionalities" below).

Format Conversion Examples

Converting VCF to GTB

This example converts a VCF file into a GTB file with default quality control and specific options.

Command:

java -jar kgga.jar gbc convert vcf2gtb ~/ukb24310_c1_b6089_v1.vcf.gz --field  --prune --seq-an 1~ --seq-af 0.000001~0.999999 -o /Users/suranyi/ukb24310_c1_b6089_v1.3.gtb

Details:

Converts the input VCF file to GTB format.
Applies default QC filters:
- GQ >= 20
- DP >= 8
- MQ >= 40
- PL >= 20
- LPL >= 20
- FT == PASS
- AD_HOM_REF <= 0.05
- AD_HOM_ALT >= 0.75
- AD_HET >= 0.25
--field: Removes INFO and FILTER fields from the VCF (no parameters specified).
--prune: Trims alternate (ALT) mutations with an allele count (AC) of 0.
--seq-an 1~: Filters by allele number range (minimum 1, no upper limit).
--seq-af 0.000001~0.999999: Filters by allele frequency range.

To disable QC, add --disable-qc. For more details on QC parameters, run:

java -jar kgga.jar gbc convert -h

Converting VCF to PLINK-PGEN

This example converts a VCF file to PLINK-PGEN format.

Command:

java -Djava.library.path=$(pip3 show jep | grep Location | awk '{print $2"/jep"}') -jar kgga.jar gbc convert vcf2plink ./ukb24310_c1_b6089_v1.vcf.gz  -o ./ukb24310_c1_b6089_v1  --output-type pgen

Details:

Requires the jep library path for PLINK-PGEN support (see "Extended Functionalities" below).
Outputs a PLINK-PGEN file with the specified name.

Converting Multiple VCF Files to a Single GTB File

This example processes multiple VCF files with a LiftOver operation.

Command:

java -jar kgga.jar gbc convert vcf2gtb 1kg.phase3.v5.shapeit2.amr.hg19.chr*.vcf.gz -o ~/tmp/AMR.hg38.gtb --liftover hg19ToHg38

Details:

Combines multiple chromosome-specific VCF files into one GTB file.
Applies a LiftOver from hg19 to hg38 coordinates.

Subset Extraction: Filtering Genotypes

You can extract subsets of genotype data (e.g., specific individuals or positions) using the convert command with additional options:

--individual ,,...: Select specific individuals by ID.
--pos [expression]: Filter by genomic position.
--index-range ~: Filter by index range.
--allele-num ~: Filter by allele number.
--seq-ac ~: Filter by allele count.
--seq-af ~: Filter by allele frequency.

Example: PLINK to VCF with Subset Extraction

Command:

java -Djava.library.path=$(pip3 show jep | grep Location | awk '{print $2"/jep"}') -jar kgga.jar gbc convert plink2vcf ./ukb24310_c1_b6089_v1 --input-type pgen --individual 1718672,2380098,5176706,4729017,1930596 --seq-an 1~  -o ./ukb24310_c1_b6089_v1.s5.vcf.gz

Details:

Converts a PLINK-PGEN file to VCF.
Filters to include only the specified individuals.
Applies an allele number filter (--seq-an 1~).

Additional Command-Line Features

GBC offers several other useful commands:

Queue Merging: java -jar kgga.jar gbc merge <file1> <file2> -o <output>

Merges two genotype files into one.
Vertical Concatenation (e.g., for chromosome files): java -jar kgga.jar gbc concat <input> <input> ... --output <output>

Combines multiple files into a single output.
Linkage Disequilibrium (LD) Calculation: java -jar kgga.jar gbc ld
Graphical User Interface (GUI): java -jar kgga.jar gbc gui

Launches a visual interface for GBC.
Database Creation: java -jar kgga.jar gbc make-database

Generates a database file from genotype data.

Note: Genomic annotation and analysis tools are integrated into KGGA (visit http://pmglab.top/kgga).

Extended Functionalities: PLINK and BGEN Support

To enable support for PLINK and BGEN formats, install the required Python libraries:

pip install jep zstandard pgenlib bgen_reader

When running GBC, specify the jep library path:

java -Djava.library.path="$(pip3 show jep | grep Location | awk '{print $2"/jep"}')" -jar kgga.jar gbc

Example Equivalent:

java -Djava.library.path=/opt/homebrew/lib/python3.13/site-packages/jep -jar kgga.jar gbc

Finding the jep Path

For Windows users or non-standard installations, determine the jep directory by running:

pip3 show jep

Locate the Location field in the output, append /jep, and use the resulting path with -Djava.library.path.

Remarks

CCF Architecture (Version 4.x):
- Features a more flexible row-column block design and fine-grained parallelization for better performance.
- Optimized for low memory usage (e.g., encoding and filtering UK Biobank whole-genome genotypes at 1GB per thread).
- Incompatible with version 3.x file formats.
Development Focus:
- KGGA and CCF prioritize systematic engineering improvements via Java APIs.
- Command-line tools are supplementary and may have usability limitations, to be addressed in the next minor release (ccf-4.6).
File Merging:
- Currently supports basic merging based on coordinate/REF consistency and standard bases (ATCG alleles).
- Future updates will enhance performance, handle multi-allelic sites, and add advanced merging modes (e.g., intersection, union, complement, left alignment).

GBC

Starting GBC 中文

Core Functionality: The convert Command

Syntax

Format Conversion Examples

Converting VCF to GTB

Converting VCF to PLINK-PGEN

Converting Multiple VCF Files to a Single GTB File

Subset Extraction: Filtering Genotypes

Example: PLINK to VCF with Subset Extraction

Additional Command-Line Features

Extended Functionalities: PLINK and BGEN Support

Finding the jep Path

Remarks

results matching ""

No results matching ""