VCF-based Quality Control¶
Genotype-level Quality Control¶
Removing genotypes if at least one of the options is set to true.
| Option | Description | Default |
|---|---|---|
--gty-gq |
Exclude genotypes with the minimal genotype quality (Phred Quality Score) per genotype < minGq. Set to ‘0’ to disable this filter.Format: --gty-gq <minGq>Example: --gty-gq 20Valid setting: [int] >=0 |
20 |
--gty-dp |
Exclude genotypes with the minimal read depth per genotype < minDp. Set to ‘0’ to disable this filter.Format: --gty-dp <minDp> Example: --gty-dp 8Valid setting: [int] >=0 |
8 |
--gty-pl |
Exclude genotypes with the second smallest normalized Phred-scaled likelihoods for genotypes < minPl. Otherwise, there would be confusing genotypes. Set to ‘0’ to disable this filter.Format: --gty-pl <minPl> Example: --gty-pl 20Valid setting: [int]>=0 |
20 |
--gty-ad-hom-ref |
Exclude genotypes with the fraction of the reads carrying alternative allele > maxAdHomRef at a reference-allele homozygous genotype. Set to ‘1’ to disable this filter.Format: --gty-ad-hom-ref <maxAdHomRef>Example: --gty-ad-hom-ref 0.05Valid setting: [float] 0.0 ~ 1.0 |
0.05 |
--gty-ad-hom-alt |
Exclude genotypes with the fraction of the reads carrying alternative allele < minAdHomAlt at an alternative-allele homozygous genotype. Set to ‘0’ to disable this filter.Format: --gty-ad-hom-alt <minAdHomAlt>Example: --gty-ad-hom-alt 0.75Valid setting: [float] 0.0 ~ 1.0 |
0.75 |
--gty-ad-het |
Exclude genotypes with the fraction of the reads carrying alternative allele < minAdHet at a heterozygous genotype. Set to ‘0’ to disable this filter.Format: --gty-ad-het <minAdHet>Example: --gty-ad-het 0.25Valid setting: [float] 0.0 ~ 1.0 |
0.25 |
--gty-qc |
Exclude genotypes where the genotype quality metric corresponding to the keyword has not passed Java expression quality control. --gty-qcis is a combination of parameters that meet custom genotype QC needs. The default tag is used to control whether to retain or discard the genotype when a quality control parsing error occurs.Format: --gty-qc <keyword> <rule> default=[RETAIN/DISCARD]Example: --gty-qc DP "DP.toInt()>=10" default=DISCARD |
[OFF] |
Variant-level Quality Control¶
| Option | Description | Default |
|---|---|---|
--allele-num |
Exclude variants with the alternative allele number per variant outside the range [minAlleleNum, maxAlleleNum]. Format: --allele-num <minAlleleNum>~<maxAlleleNum> Example: --allele-num 2~4Valid setting: [int] 0 ~ 255 |
[OFF] |
--seq-ac |
Exclude variants with the alternative allele count (AC) per variant outside the range [minAc, maxAc].Format: --seq-ac <minAc>~<maxAc>Example: --seq-ac 1~10Valid setting: [int] >=0 |
[OFF] |
--seq-an |
Exclude variants with the non-missing allele number (AN) per variant outside the range [minAn, maxAn].Format: --seq-an <minAn>~<maxAn>Example: --seq-an 160~200Valid setting: [int] >=0 |
[OFF] |
--seq-af |
Exclude variants with the alternative allele frequency (AF) per variant outside the range [minAf, maxAf].Format: --seq-af <minAf>~<maxAf>Example: --seq-af 0.05~1.0Valid setting: [float] 0.0 ~ 1.0 |
[OFF] |
--seq-qual |
Exclude variants with the minimal overall sequencing quality score (Phred Quality Score) per variant < minQual.Format: --seq-qual <minQual>Example: --seq-qual 30Valid setting: [float] >=0.0 |
30 |
--seq-mq |
Exclude variants with the minimal overall mapping quality score (Mapping Quality Score) per variant < minMq.Format: --seq-mq <minMq>Example: --seq-mq 20Valid setting: [float] >=0.0 |
20 |
--seq-fs |
Exclude variants with the overall strand bias Phred-scaled p-value (using Fisher’s exact test) per variant > maxFs. The strand bias estimation is best suited for low-coverage situations. Set to ‘100’ to disable this filter as the maximal phred-scaled p-value is 100. Format: --seq-fs <maxFs>Example: --seq-fs 60Valid setting: [float] >=0.0 |
100 |
--seq-info |
Exclude variants where the value of the specified keyword in the INFO field does not pass the Java expression quality control. --seq-infois a combination of parameters. The default tag is used to control whether to retain or discard the genotype when a quality control parsing error occurs. Format: --seq-info <keyword> <rule> default=[RETAIN/DISCARD]Example: --seq-info keyword=MQ rule=MQ.char2Float()>=20 default=DISCARD |
[OFF] |
--seq-filter |
Exclude variants where the value of the specified keyword in the FILTER field of VCF does not pass quality control. It uses Java expressions flexibly. In the expressions, e.g., ‘value.XXX(string)’ operates the value of the FILTER field as a Java String. Format: --seq-filter Java String expressionExample: --seq-filter value.valueEquals(\"PASS\") will exclude variants at which the FILTER field is not equal to PASS–seq-filter value.indexOf("q10") != -1` will exclude variants at which the FILTER field contains q10 |
[OFF] |
Turn Off Quality Control¶
All quality control options mentioned above, including genotype-level and variant-level quality control options, can be turned off by --disable-qc.
| Option | Description | Default |
|---|---|---|
--disable-qc |
Disable all quality control options mentioned above. NOTE It cannot be used in conjunction with other quality control options, as it will render them ineffective. Format: --disable-qc |
[OFF] |
Mutation Type¶
| Option | Description | Default |
|---|---|---|
--only-snv |
Only single-nucleotide polymorphism variants (SNP) are retained and analyzed. Format: --only-snv |
[OFF] |
--only-indel |
Only small insertion or deletion (InDel, <=50 bp) variants are retained and analyzed. Format: --only-indel |
[OFF] |
In the following functions, if a PED file is provided, the genotype data for individuals present in both the VCF and PED files will be utilized for the selection process. Without a PED file, the selection will proceed based solely on the genotype data available for all individuals listed in the VCF file.
Allele Frequency¶
| Option | Description | Default |
|---|---|---|
--local-af |
Exclude variants in all subjects with alternative allele frequency (AF) outside the range [minAF, maxAF].Format: --local-af <minAF>~<maxAF>Example: --local-af 0.05~1.0Valid setting: [float] 0.0 ~ 1.0 |
[OFF] |
--local-af-case |
Exclude variants in cases with alternative allele frequency (AF) outside the range [minAF, maxAF].Format: --local-af-case <minAF>~<maxAF>Example: --local-af-case 0.05~1.0Valid setting: [float] 0.0 ~ 1.0 |
[OFF] |
--local-af-control |
Exclude variants in controls with alternative allele frequency (AF) outside the range [minAF, maxAF].Format: --local-af-control <minAF>~<maxAF>Example: --local-af-control 0.05~1.0Valid setting: [float] 0.0 ~ 1.0 |
[OFF] |
--min-case-control-af-ratio |
Exclude variants at which the alternative allele frequency (AF) in cases is less than that of in controls multiplied by a specified ratio. Format: --min-case-control-af-ratio <ratio>Example: --min-case-control-af-ratio 2.0Valid setting: [float] >= 0.0 |
[OFF] |
--local-maf |
Exclude variants in all subjects with minor allele frequency (MAF) outside the range [minMAF, maxMAF]. By definition, MAF represents the frequency of the less common allele. An interesting thing about the human reference genome is that the “reference” allele is not always the common or “major” allele in the human population. When AF<=0.5, MAF equals AF; when AF > 0.5, MAF is calulated as 1-AF. Format: --local-maf <minMAF>~<maxMAF>Example: --local-maf 0.05~0.5Valid setting: [float] 0.0 ~ 0.5 |
[OFF] |
--local-maf-case |
Exclude variants in cases with minor allele frequency (MAF) outside the range [minMAF, maxMAF].Format: --local-maf-case <minMAF>~<maxMAF>Example: --local-maf-case 0.05~0.5Valid setting: [float] 0.0 ~ 0.5 |
[OFF] |
--local-maf-control |
Exclude variants in controls with minor allele frequency (MAF) outside the range [minMAF, maxMAF].Format: --local-maf-control <minMAF>~<maxMAF>Example: --local-maf-control 0.05~0.5Valid setting: [float] 0.0 ~ 0.5 |
[OFF] |
--min-case-control-maf-ratio |
Exclude variants at which the minor allele frequency (MAF) in cases is less than that of in controls multiplied by a specified ratio. Format: --min-case-control-maf-ratio <ratio>Example: --min-case-control-maf-ratio 2.0Valid setting: [float] >= 0.0 |
[OFF] |
Missing Genotype Rate¶
| Option | Description | Default |
|---|---|---|
--min-obs-rate |
Exclude variants in all subjects with the observed rate of non-missing genotypes <minObsRate.Format: --min-obs-rate <minObsRate> Example: --min-obs-rate 0.8 Valid setting: [float] 0.0 ~ 1.0 |
[OFF] |
--min-obs-rate-case |
Exclude variants in cases with the observed rate of non-missing genotypes <minObsRate. Format: --min-obs-rate-case <minObsRate> Example: --min-obs-rate-case 0.8 Valid setting: [float] 0.0 ~ 1.0 |
[OFF] |
--min-obs-rate-control |
Exclude variants in controls with the observed rate of non-missing genotypes <minObsRate. Format: --min-obs-rate-control <minObsRate> Example: --min-obs-rate-control 0.8 Valid setting: [float] 0.0 ~ 1.0 |
[OFF] |
Hardy-Weinberg Equilibrium¶
| Option | Description | Default |
|---|---|---|
--hwe |
Exclude variants in all subjects with the Hardy-Weinberg test p value <= pThreshold. Format: --hwe <pThreshold> Example: --hwe 1E-5 Valid setting: [double] 0.0 ~ 1.0 |
[OFF] |
--hwe-case |
Exclude variants in cases with the Hardy-Weinberg test p value <=pThreshold. Format: --hwe-case <pThreshold> Example: --hwe-case 1E-5 Valid setting: [double] 0.0 ~ 1.0 |
[OFF] |
--hwe-control |
Exclude variants in controls with the Hardy-Weinberg test p value <=pThreshold. Format: --hwe-control <pThreshold> Example: --hwe-control 1E-5 Valid setting: [double] 0.0 ~ 1.0 |
[OFF] |