Input

KGGA currently accepts VCF and GTB format files (one or more, with or without PED files) as input files for germ-line variants research and MAF format files for somatic mutations analysis.

Option Description Default
--input-gty-file Specify the input file. --input-gty-file is a combination of parameters. <type> is used to specify the format of the input file. <refG> is used to specify the reference genome of input variants.
Format: --input <file> type=[AUTO/VCF/GTB/MAF] refG=[hg18/hg19/hg38]
Eaxmple: --input ./example.vcf.gz type=VCF refG=hg38
type=AUTO refG=hg38

Variant and Genotype File

For germ-line variant research, the expected input data for the program should follow the VCF (Variant Calling Format) format. Here is a short description. The program accepts input from VCF files in

  • text format. Suffix as .vcf.

  • GZ compression format by gzip <path/to/VCF>. Suffix as .vcf.gz.

  • BGZ compression format by bgzip <path/to/VCF>. Suffix as .vcf.bgz or .vcf.gz.

  • GenoType Block (GTB) format produced by Genotype Blocking Compressor (GBC), which facilitates ultra-fast access for large-scale genotypes of hundreds of thousands of subjects. Suffix as .gtb. The program will automatically convert the input VCF file to GTB format in the first step. The GTB file generated can be used as an input file for subsequent analyses. Alternatively, you can manually generate the GTB file using the command:

    java -jar kgga.jar gbc convert vcf2gtb <path/to/VCF> --output <output> [options]
    

Pedigree and Phenotype File (optional)

To specify the phenotypes corresponding to subjects in the VCF file and the pedigree relationships between subjects or to analyze only a subset of subjects in the VCF file, you must provide information about the samples and record them in the PED file. The PED file should be in the LINKAGE Pedigree format. Here is a short description.

Option Description Default
--ped-file Specify the PED file with phenotypes. --ped-fileis a combination of parameters. <pheno> is used to set the column name of the major phenotype in the PED file. <covar> is used to set the column name(s) used as covariate phenotype(s). By default, the individual IDs in the PED file must be unique and identical to the ones defined in the VCF file(s). However, users can also ask KGGA to use a composite individual ID, which is combined as "FamilyID$IndividualID" by setting the as Y (true) to match the VCF file(s).
Format: --ped-file <file> pheno=[columnName] covar=[columnName2,columnName2,...] composite=[Y/N]
Example: --ped-file ./example.ped pheno=disease covar=QT,age composite=Y
composite=N

Mutation Annotation Format File

For somatic mutation research, the expected input data for the program should follow the MAF (Mutation Annotation Format) format. Here is a short description. The program accepts input with MAF files in

  • text format. Suffix as .maf.

  • GZ compression format by gzip <path/to/MAF>. Suffix as .maf.gz.

  • BGZ compression format by bgzip <path/to/MAF>. Suffix as .maf.bgz or .maf.gz.

  • GenoType Block (GTB) format produced by Genotype Blocking Compressor (GBC), which facilitates ultra-fast access for large-scale genotypes of hundreds of thousands of subjects. Suffix as .gtb. The program will automatically convert the input MAF file to GTB format in the first step. The GTB file generated can be used as an input file for subsequent analyses. Additionally, you can manually generate the GTB file using the command:

    java -jar kgga.jar gbc maf2gtb <path/to/MAF> -o <path/to/out>
    
Copyright ©MiaoXin Li all right reservedLast modified time: 2025-04-26 15:31:41

results matching ""

    No results matching ""