Mutation Annotation Format
MAF is a tab-delimited text file with aggregated mutation information from VCF files and are generated on a project-level. It is often used to describe somatic mutations. In KGGSeq, six columns are required, their header names are fixed but can be in any order. The six columns are:
- Tumor_Sample_UUID: Aliquot UUID for tumor sample.
- Chromosome: The affected chromosome.
- Start_Position: Lowest numeric position of the reported variant on the genomic reference sequence. Mutation start coordinate.
- Reference_Allele: The plus strand reference allele at this position. Includes the deleted sequence for a deletion or "-" for an insertion.
- Tumor_Allele1: Primary data genotype for tumor sequencing (discovery) allele 1. A "-" symbol for a deletion represents a variant. A "-" symbol for an insertion represents wild-type allele. Novel inserted sequence for insertion does not include flanking reference bases.
- Tumor_Allele2: Tumor sequencing (discovery) allele 2.
There is an example:
Tumor_Sample_UUID Chromosome Start_Position Reference_Allele Tumor_Allele1 Tumor_Allele2
TCGA-A8-A06P chr19 58864307 C A C
TCGA-A8-A06P chr19 58864307 C A C
TCGA-E9-A1NH chr19 58864366 G A G
TCGA-E9-A22B chr19 58862784 C T C
TCGA-BH-A0HP chr10 52595854 G A G
TCGA-BH-A18P chr10 52595937 G A G
TCGA-A2-A0EY chr12 9246090 C T C
TCGA-A8-A08G chr12 9251298 G A G
TCGA-B6-A0IC chr12 9220358 - T -