FAPI: P-value imputation User Manual

FAPI: Fast and Accurate P-value imputation for genetic association

Home

Download

Online Manual

Introduction
Installation
Input files
Functions & examples
Options

Miaoxin Li

Links of MX Li's tools:

Introduction

The FAPI (Fast and Accurate P-value Imputation) is a powerful multi-thread Java-based application developed to infer association p-value of a single-nucleotide polymorphism (SNP) given the association p-values of the SNPs in LD with the SNP. The p value imputation method is described in the reference paper. With the high imputation accuracy, FAPI is very fast, without requiring phases of alleles and any raw genotypes, compared to genotype imputation tools, e.g. IMPUTE and MACH.

FAPI has three main functions:

1). impute p values for untyped SNPs;
2). assess the quality of p values for typed SNPs;
3). perform meta-analysis at both untyped and typed SNPs;

Installation

Installation of Java Runtime Environment (JRE)

The JRE is required to run FAPI on any operating systems (OS). It can be downloaded from http://java.sun.com/javase/downloads/index.jsp for free.

Installation of FAPI

FAPI has not had an installation wizard by far. After downloaded and decompressed, it can be launched through a command, java -Xmx1g "./fapi.jar" [arguments], in a command prompt window provided by OS. In the command, -Xmx[size] sets maximum Java heap sizes for FAPI. A larger maximum heap size can speed up the process of analysis. A higher setting like -Xmx4g is suggested for large number of SNPs, say more than 5,000,000. The number, however, should be less than the size of physical memory.

Input files

Summary statistics (i.e., p values) input

The primary input of FAPI is a text file (either compressed in a *.gz file or not) containing summary statics of SNPs with the first row as head. The columns are delimited by spaces or tabs. Four types of information about SNPs are required: chromosome, coordinates, maker id, and p values.
The following are an example:

CHR	SNPID	POS	P-value1	Test-Mode	P-value2	...
4	rs1513559	12232332	0.02301	additive	0.007688	...
4	rs1841043	122323365	0.01115	additive	0.119	...
...	...	...	...	...	...	...

By default, FAPI requires the first four columns containing the four types of information with the same ORDER as above. Otherwise, users need specify the order of these columns by the tags, --chrom-col, --marker-col,--position-col and --p-col. See more description about the tag below.

Hint: If you only have the rsID on hand, you can use another easy tool, SnpTracker to retrieve chromosome, coordinates of SNPs first.

Data set contain LD information of SNPs

FAPI now support 2 different input format for calculating reference LD information between SNPs. Use can choose any one of appropriate formats.

Phased or unphased genotypes in VCF format

FAPI can read either phased or unphased genotypes in VCF format to account for LD between SNPs.

To ease the preparation of the LD data, we have provided the VCF data for most widely used haplotypes originally released the by the HapMap project and 1000 Genomes project.

In the analysis, users can use the resources tags to specify their interesting data. FAPI will check the existence of the data in a local machine. If the data do not exist, FAPI will automatically download from the website of FAPI by a multi-thread downloading function.

Resource tag	Description
hapmap2.r22.ceu.hg19	Haplotypes of Hapmap 2 release 22. Convert the coordinates to be hg19 from hg18 by UCSC lift over function. Complied from here.
hapmap2.r22.chbjpt.hg19
hapmap2.r22.yri.hg19
hapmap3.r2.ceu.hg19	Haplotpyes of Hapmap 3 release 2. Convert the coordinates to be hg19 from hg18 by UCSC lift over function. Compiled from here.
hapmap3.r2.chbjpt.hg19
hapmap3.r2.mex.hg19
hapmap3.r2.tsi.hg19
hapmap3.r2.yri.hg19
1kg.phase1.v3.asn.hg19	Haplotpyes of 1000 Genomes Project phase 1 version 3. Donwload from here.
1kg.phase1.v3.eur.hg19
1kg.phase1.v3.afr.hg19
1kg.phase1.v3.amr.hg19
1kg.phase3.v5.afr.hg19	Haplotpyes of 1000 Genomes Project phase 3 version 5. Download from here.
1kg.phase3.v5.amr.hg19
1kg.phase3.v5.eas.hg19
1kg.phase3.v5.sas.hg19
1kg.phase3.v5.eur.hg19

Note: The resource files are huge. If FAPI failed to download a complete version of 1KG resource files, you can go into our resource file page to download them by a more professional downloading tool.

We acknowledge the complied VCF data of 1000 Genomes projects by the author of MACH, Dr. LI Yun. To see detailed the description about the data, please visit
http://www.sph.umich.edu/csg/abecasis/MACH/download/1000G.2012-03-14.html

Unphased genotypes in Plink binary format

FAPI can directly read unphased genotypes formated by Plink, which is compressed format and can be stored and processed more efficiently.

FAPI will calculate the genotypic correlation to approximate the LD degree between SNPs. The Plink binary file set always includes three linked files *.fam, *.bim and *.bed, which should be put in the same folder.

Function & examples

There are three main functions of FAPI. Please read the detail in demo website.

	Impute association p-values at untyped SNPs		Link
	Impute p-value at untyped SNPs and conduct meta-analysis		Link
	Validate association p values at typed SNPs		Link

Options

Tag Name	Description
Analysis functions
--impute	Impute the p-values of untyped SNPs given the p-values of types SNPs according to LD information.
--meta	Impute the p-values of untyped SNPs and perform meta-analysis of multiple p-value sets. Hint: If you want to perform meta-analysis without imputing the p-values, Please use --meta and --noimpute options.
--qc	Impute the p-values of typed SNPs given the p-values of types SNPs according to LD information and estimate the chance of getting the p-values of typed SNPs.
--size	Set the sample size of cases and controls seperated by colon to generate the p-values. This will be used as weight for meta-analysis. The default value is 1:1
Input file settings
--pfile	Specify the path of a file containing p values and genomic information of sequence variants.
--gfile	Specify the path and type of files containing genotypes for calculating LD in VCF or plink binary format. Path and type of each file are separated by double colon., e.g., path/to/file::vcf or path/to/file::plink If the reference genotypes are stored in different files chromosome by chromosome, you can use _CHROM_ to denote the chromosome names [1...Y] in the file name, e.g., chr_CHROM_.phase1.cvf.chinese.hg19
--chrom-col	The header description indicated chromosome information in a file specified by --pfile. The 1st column will be used by default if this tag is not specified.
--marker-col	The header description indicated SNP rsID information in a file specified by --pfile. The 2nd column will be used by default if this tag is not specified.
--position-col	The header description indicated coordinate information in a file specified by --pfile. The 3rd column will be used by default if this tag is not specified.
--p-col	The header description indicated p value information in each file specified by --pfile. For multiple pfiles, the column names of p-value sources are delimited by comma. The 4th column will be used by default if this tag is not specified.
--missing-p	The labels of missing p values in a file specified by --pfile. The number starts from 1.
--maf	Filter out genotypes with minor allele frequency less than a number in the reference panel
Performance and Accuracy settings
--nt	The number of maximal parallel running CPU
--window-size	Set the maximal number of SNPs with actural p-values in scan window for imputation. The default value is 10.
--window-len	Set the maximal length of a scan window for imputation. The default value is 1000000bp.
--ignore-r2	Set the maximal value a pair-wise LD (r-square) between SNPs to be ignored in imputation. The default value is 0.01.
--conf-filter	Filter out imputed p-values with confidence score over the set value. The default value is 0.3.
Miscellaneous
--out	Specify the path with prefix name for output data
--resource	Specify the path of resource files. By default, it is a sub-folder named resources of the main program folder
--no-web	Switch off the function of automatically update itself

Comments and suggestions are welcome, please e-mail limx54@163.com

Reference:

Kwan JS^, Li MX^*, Deng JE, Sham PC*.FAPI: Fast and Accurate P-value Imputation for genome-wide association study. Eur J Hum Genet. 2015 Aug 26. doi: 10.1038/ejhg.2015.190.