|
Introduction |
|
The FAPI (Fast and Accurate P-value Imputation) is a powerful multi-thread Java-based application developed to infer association p-value of a single-nucleotide polymorphism (SNP) given the association p-values of the SNPs in LD with the SNP. The p value imputation method is described in the reference paper. With the high imputation accuracy, FAPI is very fast, without requiring phases of alleles and any raw genotypes, compared to genotype imputation tools, e.g. IMPUTE and MACH.
|
|
FAPI has three main functions: |
1). impute p values for untyped SNPs;
2). assess the quality of p values for typed SNPs;
3). perform meta-analysis at both untyped and typed SNPs;
|
|
Installation |
|
Installation of Java Runtime Environment (JRE) |
|
The JRE is required to run FAPI on any operating systems (OS).
It can be downloaded from http://java.sun.com/javase/downloads/index.jsp for free.
|
|
Installation of FAPI |
FAPI has not had an installation wizard by far. After downloaded and decompressed, it can be launched through a command, java -Xmx1g "./fapi.jar" [arguments],
in a command prompt window provided by OS. In the command, -Xmx[size] sets maximum Java heap sizes for FAPI. A larger maximum heap size can speed up the process of analysis.
A higher setting like -Xmx4g is suggested for large number of SNPs, say more than 5,000,000. The number, however, should be less than the size of physical memory.
|
|
Input files |
|
Summary statistics (i.e., p values) input |
The primary input of FAPI is a text file (either compressed in a *.gz file or not) containing summary statics of SNPs with the first row as head.
The columns are delimited by spaces or tabs. Four types of information about SNPs are required: chromosome, coordinates, maker id, and p values. The following are an example:
|
CHR
|
SNPID
|
POS
|
P-value1
|
Test-Mode
|
P-value2
|
...
|
4
|
rs1513559
|
12232332
|
0.02301
|
additive
|
0.007688
|
...
|
4
|
rs1841043
|
122323365
|
0.01115
|
additive
|
0.119
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
|
By default, FAPI requires the first four columns containing the four types of information with the same ORDER as above.
Otherwise, users need specify the order of these columns by the tags, --chrom-col, --marker-col,--position-col and --p-col. See more description about the tag below.
|
Hint: If you only have the rsID on hand, you can use another easy tool,
SnpTracker to retrieve chromosome, coordinates of SNPs first.
|
|
|
Data set contain LD information of SNPs |
FAPI now support 2 different input format for calculating reference LD information between SNPs. Use can choose any one of appropriate formats. |
|
Phased or unphased genotypes in VCF format |
|
|
FAPI can read either phased or unphased genotypes in VCF format to account for LD between SNPs.
To ease the preparation of the LD data, we have provided the VCF data for most widely used haplotypes originally released the by the HapMap
project and 1000 Genomes project.
In the analysis, users can use the resources tags to specify their interesting data. FAPI will check the existence of the data in a local machine. If the data do not exist,
FAPI will automatically download from the website of FAPI by a multi-thread downloading function.
|
Resource tag |
Description |
hapmap2.r22.ceu.hg19 |
Haplotypes of Hapmap 2 release 22. Convert the coordinates to be hg19 from hg18 by UCSC lift over function. Complied from
here.
|
hapmap2.r22.chbjpt.hg19 |
|
hapmap2.r22.yri.hg19 |
|
hapmap3.r2.ceu.hg19 |
Haplotpyes of Hapmap 3 release 2. Convert the coordinates to be hg19 from hg18 by UCSC lift over function. Compiled from
here.
|
hapmap3.r2.chbjpt.hg19 |
|
hapmap3.r2.mex.hg19 |
|
hapmap3.r2.tsi.hg19 |
|
hapmap3.r2.yri.hg19 |
|
1kg.phase1.v3.asn.hg19 |
Haplotpyes of 1000 Genomes Project phase 1 version 3. Donwload from
here.
|
1kg.phase1.v3.eur.hg19 |
|
1kg.phase1.v3.afr.hg19 |
|
1kg.phase1.v3.amr.hg19 |
|
1kg.phase3.v5.afr.hg19 |
Haplotpyes of 1000 Genomes Project phase 3 version 5. Download from
here.
|
1kg.phase3.v5.amr.hg19 |
|
1kg.phase3.v5.eas.hg19 |
|
1kg.phase3.v5.sas.hg19 |
|
1kg.phase3.v5.eur.hg19 |
|
|
Note: The resource files are huge. If FAPI failed to download a complete version of 1KG resource files, you can go into our resource file page to download them by a more professional downloading tool.
We acknowledge the complied VCF data of 1000 Genomes projects by the author of MACH, Dr. LI Yun. To see detailed the description about the data,
please visit http://www.sph.umich.edu/csg/abecasis/MACH/download/1000G.2012-03-14.html
|
|
Unphased genotypes in Plink binary format |
|
|
FAPI can directly read unphased genotypes formated by Plink,
which is compressed format and can be stored and processed more efficiently.
FAPI will calculate the genotypic correlation to approximate the LD degree between SNPs. The Plink binary file set always includes three linked files *.fam, *.bim and *.bed, which should be put in the same folder.
|
|
Function & examples |
|
There are three main functions of FAPI. Please read the detail in demo website. |
|
Impute association p-values at untyped SNPs |
Link |
|
Impute p-value at untyped SNPs and conduct meta-analysis |
Link |
|
Validate association p values at typed SNPs |
Link |
|
|
Options |
|
Tag Name |
Description |
Analysis functions |
--impute |
Impute the p-values of untyped SNPs given the p-values of types SNPs according to LD information. |
--meta |
Impute the p-values of untyped SNPs and perform meta-analysis of multiple p-value sets.
Hint: If you want to perform meta-analysis without imputing the p-values, Please use --meta and --noimpute options.
|
--qc |
Impute the p-values of typed SNPs given the p-values of types SNPs according to LD information and estimate the chance of getting the p-values of typed SNPs. |
--size |
Set the sample size of cases and controls seperated by colon to generate the p-values. This will be used as weight for meta-analysis. The default value is 1:1 |
Input file settings |
--pfile |
Specify the path of a file containing p values and genomic information of sequence variants. |
--gfile |
Specify the path and type of files containing genotypes for calculating LD in VCF or plink binary format. Path and type of each file are separated by double colon., e.g., path/to/file::vcf or path/to/file::plink
If the reference genotypes are stored in different files chromosome by chromosome, you can use _CHROM_ to denote the chromosome names [1...Y] in the file name, e.g., chr_CHROM_.phase1.cvf.chinese.hg19
|
--chrom-col |
The header description indicated chromosome information in a file specified by --pfile. The 1st column will be used by default if this tag is not specified. |
--marker-col |
The header description indicated SNP rsID information in a file specified by --pfile. The 2nd column will be used by default if this tag is not specified. |
--position-col |
The header description indicated coordinate information in a file specified by --pfile. The 3rd column will be used by default if this tag is not specified. |
--p-col |
The header description indicated p value information in each file specified by --pfile. For multiple pfiles, the column names of p-value sources are delimited by comma. The 4th column will be used by default if this tag is not specified.
|
--missing-p |
The labels of missing p values in a file specified by --pfile. The number starts from 1. |
--maf |
Filter out genotypes with minor allele frequency less than a number in the reference panel |
Performance and Accuracy settings |
--nt |
The number of maximal parallel running CPU |
--window-size |
Set the maximal number of SNPs with actural p-values in scan window for imputation. The default value is 10. |
--window-len |
Set the maximal length of a scan window for imputation. The default value is 1000000bp. |
--ignore-r2 |
Set the maximal value a pair-wise LD (r-square) between SNPs to be ignored in imputation. The default value is 0.01. |
--conf-filter |
Filter out imputed p-values with confidence score over the set value. The default value is 0.3. |
Miscellaneous |
--out |
Specify the path with prefix name for output data |
--resource |
Specify the path of resource files. By default, it is a sub-folder named resources of the main program folder |
--no-web |
Switch off the function of automatically update itself |
|
|
|
Comments and suggestions are welcome, please e-mail limx54@163.com
|
Reference: |
Kwan JS^, Li MX^*, Deng JE, Sham PC*.FAPI: Fast and Accurate P-value Imputation for genome-wide association study. Eur J Hum Genet. 2015 Aug 26. doi: 10.1038/ejhg.2015.190. |