SDF Archive
The SDF archive shown in the above figure is one of the core foundations of the SDFA tool and the cornerstone of subsequent downstream analysis. Here, we provide a more detailed description of the fields in the above SDF structure to gain a deeper understanding of the SDF file:
Group | Field | Value Type | Description |
---|---|---|---|
LOCATION |
coordinate |
int[3] | The start and end positions of the chromosome where the current SV is located |
LOCATION |
length |
int | The length of the current SV (for example, for an insertion variation, it is impossible to determine its length only relying on the coordinate field value) |
LOCATION |
type |
int | Type of the current SV |
GENOTYPE |
genotypes |
bytecode | The genotype of the current sample under this SV |
GENOTYPE |
metrics |
bytecodeList | Quality metrics information of the current genotype |
VCF Field |
id |
bytecode | The ID information of the current SV in the original VCF file |
VCF Field |
ref |
bytecode | The REF information of the current SV in the original VCF file |
VCF Field |
alt |
bytecode | The ALT information of the current SV in the original VCF file |
VCF Field |
qual |
bytecode | The QUAL information of the current SV in the original VCF file |
VCF Field |
filter |
bytecode | The FILTER information of the current SV in the original VCF file |
VCF Field |
info |
bytecodeList | The INFO information of the current SV in the original VCF file |
CSV INDEX |
line |
int | The line number of the current SV in the original VCF file |
CSV INDEX |
chr |
int[N] | If the current SV is a complex SV, record the chromosomes where all the split SVs are located |
ANNOTATION INDEX |
indexes |
int[N] | Record the intervals of lines related to the current SV and various annotations |
Decomposition and Assembly of SV
The "decomposition" concept of SV is introduced in SDF storage. Specifically, we decompose all SVs into multiple single intervals on the same chromosome. Each split single - interval SV is called a Standardized Decomposition SV (SDSV), which is the basic unit of SDF file analysis.
To better understand, we first draw the "Schematic Diagram of Decomposition and Reconstruction Principle" below:

Next, use the VCF file as input to show a specific decomposition example:
