Again, following workflow-based programming, programmers can easily call the API function to annotate sequence variants. The input variants should be stored in the GTB format. The following are codes to annotate variant gene features and allele frequencies.
// This code snippet demonstrates how to configure and execute an annotation workflow using the KGGA // platform. The snippet includes setting up the annotation resources, defining input data, configuring // annotation databases, and exporting annotated variants to a TSV file format. The code also shows how // to add tasks to a workflow and execute them sequentially.
public class AnnotationExample {
public static void main(String[] args) {
// Step 1: Create an object for annotation options.
// The `AnnotationOptions` object is used to configure various settings related to the annotation
// process, such as specifying databases and frequency cutoffs.
AnnotationOptions annotationOptions = new AnnotationOptions();
try {
// Step 2: Set the source of annotation resources.
// The `Channel` class is used to add local and online channels (resource paths) where
// annotation databases and other resources are stored. These channels provide access to
// annotation data, such as population allele frequencies or gene annotations.
Channel.addChannel("./resources"); // Local resource path
Channel.addChannel("https://idc.biosino.org/pmglab/resource/kgg/kgga/resources"); // Online resource URL
// Step 3: Specify the workspace directory for the workflow.
// The workspace is a directory that stores intermediate files and results generated during the
// annotation and analysis processes.
File workspace = new File("./test1");
// Step 4: Define the number of threads for parallel processing.
// The number of threads determines how many tasks will be processed simultaneously, allowing
// for faster execution when running on multi-core systems.
int threadNum = 4;
// Step 5: Create an executor object to manage and execute the workflow.
// The `Executor` class provides a framework to handle the various tasks required for annotation
// and other computational steps.
Workflow workflow = new Executor();
// Step 6: Track the workflow and workspace for output management.
// This utility function links the workflow with the specified workspace, ensuring that
// all intermediate files and outputs are managed properly.
Utility.addTrack(workflow, workspace);
// Step 7: Execute any initial tasks defined in the workflow.
// Although there are no specific tasks at this point, this command ensures that the workflow
// is in a clean state before adding new tasks.
workflow.execute();
workflow.clearTasks(); // Clear any residual tasks to prevent conflicts.
// Step 8: Set the path to the input GTB file for annotation.
// This file contains the genetic variants to be annotated. The file path is passed to the
// workflow as a parameter named "AnnotationBaseVariantSet".
String inputAnnotationBasedGTBPath = "/Users/jianglin/Working/tools/kgga/test/demo2/GenerateAnnotationBaseTask/variants.annot.hg38.gtb";
File annotationBasedGTB = new File(inputAnnotationBasedGTBPath);
workflow.setParam("AnnotationBaseVariantSet", annotationBasedGTB);
// Step 9: Configure the annotation databases.
// Various annotation databases can be specified to provide additional information about
// genetic variants, such as allele frequencies or gene annotations.
// - `freqDatabase`: Specifies databases for allele frequency annotations. Here, the `gnomad`
// database is used with specific subpopulations: "EAS" (East Asian) and "AFR" (African).
annotationOptions.freqDatabase.add(new DatabaseDescription("gnomad", new String[]{"gnomADv4.0::EAS", "gnomADv4.0::AFR"}));
// Set the allele frequency (AF) range filter.
// This filter restricts the annotated variants to those with allele frequencies between 0 and
// 0.01, ensuring that only rare variants are considered.
annotationOptions.dbAf = new FloatInterval(0, 0.01f);
// Specify the gene annotation database.
// The `gencode` database is used here, providing comprehensive gene annotation information.
// Gene annotations help link genetic variants to genes and identify their functional context.
annotationOptions.geneDatabase = List.singleton(new DatabaseDescription("gencode"));
// Step 10: Add the annotation tasks to the workflow.
// The `AnnotationPipeline` class builds the necessary tasks for variant annotation using the
// specified options, workspace, and number of threads. These tasks are then added to the workflow.
workflow.addTasks(new AnnotationPipeline(annotationOptions, workspace, threadNum).build());
// Step 11: Add a task to export annotated variants to a TSV file.
// The `OutputVariants2TSVTask` is responsible for exporting the annotated variants stored in
// the GTB format to a tab-separated value (TSV) file. This file can be easily viewed and
// processed by other bioinformatics tools.
workflow.addTasks(new OutputVariants2TSVTask(workspace, threadNum));
// Step 12: Execute the workflow to perform annotation and export tasks.
// This command initiates the workflow, executing all tasks sequentially and producing the
// desired output files (e.g., annotated TSV files) in the specified workspace.
workflow.execute();
} catch (IOException e) {
// Handle any exceptions that occur during file I/O operations.
throw new RuntimeException(e);
}
}
}