gnomAD Allele frequency
The Genome Aggregation Database (gnomAD) is a critical resource for genetic studies, providing a high-precision frequency map of human genetic variations. The latest release, gnomAD v4.1, incorporates sequencing data from 807,162 individuals, a fivefold increase compared to previous versions. It includes two core datasets:
- Exome Dataset: Covers 730,947 individuals, including 416,555 samples from the UK Biobank.
- Whole Genome Dataset: Includes 76,215 individuals with whole-genome sequencing data.
All data are based on the Hg38 reference genome and integrate resources from projects like ExAC, 1000 Genomes, and UK Biobank. A standardized QC pipeline ensures data reliability by filtering low-quality samples and variants.
KGGA has extracted three distinct datasets from gnomAD v4.1: the whole-genome sequencing frequency data, the exome sequencing frequency data, and the joint dataset. These datasets provide comprehensive mutation frequency information across global populations, serving as a critical resource for disease association studies.
Generate gnomAD Annotation Files
- Switch to the gnomAD directory:
cd path/to/gnomad/vcf/files
- Run the command to generate the annotation file:
java -Dccf.compressor.zstd.level=16 -jar kgga.jar gbc make-database --gnomad $(echo $(ls *.vcf.bgz) | tr ' ' ',') -o gnomad.joint.v4.1.sites.hg38.gtb -t 20
Command Explanation
Parameter | Description |
---|---|
-Dccf.compressor.zstd.level=16 |
Sets Zstandard compression level to 16 for optimal efficiency. |
-jar kgga.jar |
Runs the kgga.jar program. |
gbc make-database |
Invokes the gbc tool to create a database. |
--gnomad |
Specifies input files as gnomAD VCF files. |
echo $(ls *.vcf.bgz) |
Outputs the file list as a single line. |
tr ' ' ',') |
Replaces spaces in the file list with commas to generate a comma-separated list. |
-o gnomad.joint.v4.1.sites.hg38.gtb |
Specifies the output file name. |
-t 20 |
Uses 20 threads for parallel processing. |
gnomAD Annotation Fields
Annotation Field | Description |
---|---|
gnomAD_joint@ALL | Alternate allele frequency in joint dataset. |
gnomAD_joint@AFR | Alternate allele frequency in samples of African / African - American ancestry in joint dataset. |
gnomAD_joint@AMI | Alternate allele frequency in samples of Amish ancestry in joint dataset. |
gnomAD_joint@AMR | Alternate allele frequency in samples of Latino ancestry in joint dataset. |
gnomAD_joint@ASJ | Alternate allele frequency in samples of Ashkenazi Jewish ancestry in joint dataset. |
gnomAD_joint@EAS | Alternate allele frequency in samples of East Asian ancestry in joint dataset. |
gnomAD_joint@FIN | Alternate allele frequency in samples of Finnish ancestry in joint dataset. |
gnomAD_joint@MID | Alternate allele frequency in samples of Middle Eastern ancestry in joint dataset. |
gnomAD_joint@NFE | Alternate allele frequency in samples of Non - Finnish European ancestry in joint dataset. |
gnomAD_joint@SAS | Alternate allele frequency in samples of South Asian ancestry in joint dataset. |
gnomAD_genomes@ALL | Alternate allele frequency in genomes dataset. |
gnomAD_genomes@AFR | Alternate allele frequency in samples of African / African - American ancestry in genomes dataset. |
gnomAD_genomes@AMI | Alternate allele frequency in samples of Amish ancestry in genomes dataset. |
gnomAD_genomes@AMR | Alternate allele frequency in samples of Latino ancestry in genomes dataset. |
gnomAD_genomes@ASJ | Alternate allele frequency in samples of Ashkenazi Jewish ancestry in genomes dataset. |
gnomAD_genomes@EAS | Alternate allele frequency in samples of East Asian ancestry in genomes dataset. |
gnomAD_genomes@FIN | Alternate allele frequency in samples of Finnish ancestry in genomes dataset. |
gnomAD_genomes@MID | Alternate allele frequency in samples of Middle Eastern ancestry in genomes dataset. |
gnomAD_genomes@NFE | Alternate allele frequency in samples of Non - Finnish European ancestry in genomes dataset. |
gnomAD_genomes@SAS | Alternate allele frequency in samples of South Asian ancestry in genomes dataset. |
gnomAD_exomes@ALL | Alternate allele frequency in exomes dataset. |
gnomAD_exomes@AFR | Alternate allele frequency in samples of African / African - American ancestry in exomes dataset. |
gnomAD_exomes@AMI | Alternate allele frequency in samples of Amish ancestry in exomes dataset. |
gnomAD_exomes@AMR | Alternate allele frequency in samples of Latino ancestry in exomes dataset. |
gnomAD_exomes@ASJ | Alternate allele frequency in samples of Ashkenazi Jewish ancestry in exomes dataset. |
gnomAD_exomes@EAS | Alternate allele frequency in samples of East Asian ancestry in exomes dataset. |
gnomAD_exomes@FIN | Alternate allele frequency in samples of Finnish ancestry in exomes dataset. |
gnomAD_exomes@MID | Alternate allele frequency in samples of Middle Eastern ancestry in exomes dataset. |
gnomAD_exomes@NFE | Alternate allele frequency in samples of Non - Finnish European ancestry in exomes dataset. |
gnomAD_exomes@SAS | Alternate allele frequency in samples of South Asian ancestry in exomes dataset. |