Functional Annotation
To further locate susceptible SVs and explore their biological mechanisms, it is often necessary to perform functional annotation on SVs. In order to quickly conduct functional annotation for multiple samples and multiple resources, SDFA has designed the index sliding window algorithm shown in the figure below to accelerate the annotation process.
At the same time, in order to perform customized annotation for multiple resources, SDFA has designed the post-annotate annotation method. The output file can be customized through the configuration file shown in the figure below:

Configuration File
SDFA draws on the post annotation concept of the Vcfanno tool and defines a configuration file for configuring annotation resources and output results. The specific explanation is as follows:
Parameters | Parameter Explanation | Required or not ( * represents required, . represents optional) |
---|---|---|
[[annotation]] |
The start identifier for a new annotation resource | * |
file |
The full path of the annotation resource file | * |
type |
Type of annotation resource (3 types are supported: gene , interval , and svdatabase ) |
* |
names |
Column names of the output results | . |
fields |
Column names of the input file | . |
opts |
Functions corresponding to the output columns | Needs to match the size of names |
Here we explain the above parameters through an example:
[!NOTE|label:Example 1]
Next, we will operate on the
interval_1.txt
file. This file is of the interval type, and its general content is:
Elements between single lines are separated by
\t
. Now we want to obtain theCOL1
column related to SV, so we setfields=["COL1"]
here. At the same time, we want the output column to be namedname
, so we setnames=["name"]
. The final configuration file is as follows:[[annotation]] file=/Users/wenjiepeng/Desktop/SDFA_4.0/test/annotation/data/interval_1.txt type=interval names=["name"] fields=["COL1"] opts=["concat"]
After completing the above configuration file, we perform annotation through the following command:
java -jar sdfa.jar annotate \ --config /Users/wenjiepeng/Desktop/SDFA_4.0/test/annotation/data/config.txt \ -t 4 -d /Users/wenjiepeng/Desktop/SDFA_4.0/test/vcf \ -o /Users/wenjiepeng/Desktop/SDFA_4.0/test/annotation/res
The results after annotation are as follows:
Among them, the fileID corresponds to different SV files in this folder (
/Users/wenjiepeng/Desktop/SDFA_4.0/test/vcf
).
Annotation Resources
SDFA supports integrating external databases for SV annotation, provided that the external data files meet the following basic format requirements:
- The tab character (
\t
) must be used as the delimiter. - The annotation file must contain a header line, and the column names should start with the number sign (
#
) (lines starting with##
will be ignored).
In addition, when SDFA introduces interval annotation files and SV database files to annotate existing SV data, the following specific conditions must be met:
For interval annotation files, the following requirements must be met:
| Column | Name | Type | Example | | ------ | ---------------- | ------- | ------- | | 1 | Chromosome | String | chr1 | | 2 | Start Position | Integer | 1000 | | 3 | End Position | Integer | 3000 | | 4 | [Feature 1 Name] | String | V1 | | … | … | … | … |
For SV database files, the following requirements must be met:
| Column | Name | Type | Example | | ------ | ---------------- | ------- | ------- | | 1 | Chromosome | String | chr1 | | 2 | Start Position | Integer | 1000 | | 3 | End Position | Integer | 3000 | | 4 | SV Length | Integer | 2000 | | 5 | SV Type | String | DEL | | 6 | [Feature 1 Name] | String | V1 | | … | … | … | … |