Quick-Start Examples

After confirming that a Java runtime environment (version 8 or higher) is installed, you can run kgga.jar directly without additional configuration. The following examples showcase how KGGA cleans and annotates sequence variants and genotypes. These quick-start demos use small datasets retrieved from the internet, requiring no local resources. They are optimized for small datasets and serve as an easy way to explore KGGA’s core features.


This example retrieves genotypes from an online VCF file, applies default quality control (QC) criteria, filters for low-frequency variants (minor allele frequency between 0.05 and 0.5), and exports the cleaned data in PLINK’s BED format.

Example Command

java -jar kgga.jar \
   clean \
   --input-gty-file https://idc.biosino.org/pmglab/resource/kgg/kgga/example/assoc.hg19.vcf.gz refG=hg19 \
   --ped-file https://idc.biosino.org/pmglab/resource/kgg/kgga/example/assoc.ped \
   --output ./test/demo1 \
   --local-maf 0.05~0.5 \
   --output-gty-format PLINK_BED

Key Parameters

  • java This is the command that starts the Java Virtual Machine (JVM). Since KGGA is a Java-based application, java is used to execute it.
  • -jar kgga.jar This tells the JVM to run the kgga.jar file, a Java Archive (JAR) that contains the KGGA application. A JAR file bundles all the necessary code and key resources, making it the executable file for KGGA.
  • --input-gty-file refG=hg19: Specifies the input VCF file (fetched from a URL) and the reference genome (hg19).
  • --ped-file : Provides the pedigree and phenotype file, also retrieved from a URL.
  • --output ./test/demo1: Defines the output directory for the cleaned data.
  • --local-maf 0.05~0.5: Filters variants with a minor allele frequency (MAF) between 0.05 and 0.5.
  • --output-gty-format PLINK_BED: Exports the cleaned genotypes in PLINK’s BED format.

Output

The cleaned genotype data is saved in PLINK BED format in the specified directory (./test/demo1).


Annotate Variants from VCF Data and Output in TSV Format

This example retrieves genotypes from an online VCF file, applies default QC criteria, annotates gene features, filters for non-synonymous variants, and adds functional prediction scores from the dbNSFP database. The results are saved in a compressed TSV file.

Example Command

java -Dccf.remote.timeout=60 -jar kgga.jar \
   annotate \
   --input-gty-file https://idc.biosino.org/pmglab/resource/kgg/kgga/example/rare.disease.hg19.vcf refG=hg19 \
   --ped-file https://idc.biosino.org/pmglab/resource/kgg/kgga/example/rare.disease.ped.txt \
   --output ./test/demo2 \
   --gene-feature-included 0~6 \
   --variant-annotation-database dbnsfp

Key Parameters

  • -Dccf.remote.timeout=60 This part sets a system property for the JVM, configuring a timeout for remote operations within KGGA. The value 60 sets the timeout to 60 seconds. The Dccf.remote.timeout=60 setting is key when KGGA interacts with remote resources, like online databases or large datasets. A timeout of 60 seconds might work fine in many cases, but you might need to tweak it if:

    • Your network is slow, causing delays.

    • You’re working with big data that takes longer to fetch or process.

    • You see timeout errors during execution.

By setting this property, you control how long KGGA waits before giving up on a remote operation, making it adaptable to your specific needs.

  • --input-gty-file refG=hg19: Specifies the input VCF file (fetched from a URL) and the reference genome (hg19).
  • --ped-file : Provides the pedigree and phenotype file from a URL.
  • --output ./test/demo2: Sets the output directory for the annotated data.
  • --gene-feature-included 0~6: Annotates gene features (e.g., frameshift, missense), using indices 0 to 6.
  • --variant-annotation-database dbnsfp: Adds functional prediction scores from the dbNSFP database.

Output

The annotated variants and their details are saved in a compressed TSV file in the specified directory (./test/demo2).

Copyright ©MiaoXin Li all right reservedLast modified time: 2025-04-25 02:37:24

results matching ""

    No results matching ""