Input

KGGA currently accepts VCF and GTB format files (one or more, with or without PED files) as input files for germ-line variants research and MAF format files for somatic mutations analysis.

Option	Description	Default
`--input-gty-file`	Specify the input file. `--input-gty-file` is a combination of parameters. `<type>` is used to specify the format of the input file. `<refG>` is used to specify the reference genome of input variants. Format: `--input <file> type=[AUTO/VCF/GTB/MAF] refG=[hg18/hg19/hg38]` Eaxmple: `--input ./example.vcf.gz type=VCF refG=hg38`	type=AUTO refG=hg38

Variant and Genotype File

For germ-line variant research, the expected input data for the program should follow the VCF (Variant Calling Format) format. Here is a short description. The program accepts input from VCF files in

text format. Suffix as .vcf.
GZ compression format by gzip <path/to/VCF>. Suffix as .vcf.gz.
BGZ compression format by bgzip <path/to/VCF>. Suffix as .vcf.bgz or .vcf.gz.
GenoType Block (GTB) format produced by Genotype Blocking Compressor (GBC), which facilitates ultra-fast access for large-scale genotypes of hundreds of thousands of subjects. Suffix as .gtb. The program will automatically convert the input VCF file to GTB format in the first step. The GTB file generated can be used as an input file for subsequent analyses. Alternatively, you can manually generate the GTB file using the command:
```
java -jar kgga.jar gbc convert vcf2gtb <path/to/VCF> --output <output> [options]
```

Pedigree and Phenotype File (optional)

To specify the phenotypes corresponding to subjects in the VCF file and the pedigree relationships between subjects or to analyze only a subset of subjects in the VCF file, you must provide information about the samples and record them in the PED file. The PED file should be in the LINKAGE Pedigree format. Here is a short description.

Option	Description	Default
`--ped-file`	Specify the PED file with phenotypes. `--ped-file`is a combination of parameters. `<pheno>` is used to set the column name of the major phenotype in the PED file. `<covar>` is used to set the column name(s) used as covariate phenotype(s). By default, the individual IDs in the PED file must be unique and identical to the ones defined in the VCF file(s). However, users can also ask KGGA to use a composite individual ID, which is combined as "FamilyID$IndividualID" by setting the as Y (true) to match the VCF file(s). Format: `--ped-file <file> pheno=[columnName] covar=[columnName2,columnName2,...] composite=[Y/N]` Example: `--ped-file ./example.ped pheno=disease covar=QT,age composite=Y`	composite=N

Mutation Annotation Format File

For somatic mutation research, the expected input data for the program should follow the MAF (Mutation Annotation Format) format. Here is a short description. The program accepts input with MAF files in

text format. Suffix as .maf.
GZ compression format by gzip <path/to/MAF>. Suffix as .maf.gz.
BGZ compression format by bgzip <path/to/MAF>. Suffix as .maf.bgz or .maf.gz.
GenoType Block (GTB) format produced by Genotype Blocking Compressor (GBC), which facilitates ultra-fast access for large-scale genotypes of hundreds of thousands of subjects. Suffix as .gtb. The program will automatically convert the input MAF file to GTB format in the first step. The GTB file generated can be used as an input file for subsequent analyses. Additionally, you can manually generate the GTB file using the command:
```
java -jar kgga.jar gbc maf2gtb <path/to/MAF> -o <path/to/out>
```

Basic

Input

Variant and Genotype File

Pedigree and Phenotype File (optional)

Mutation Annotation Format File

results matching ""

No results matching ""