Annotating genomic variation is a critical step when using Next Generation Sequencing (NGS) in the clinic as well as to gain a deeper understanding of population genetics, evolution and biological function. In addition to a range of methods for variant annotation (automatic predictions, manual annotation, and cross-species evidence), there are different classes of annotation: factual annotation (e.g., population minor allele frequency, occurrence of an allele in a compound heterozygote), gene annotation comparisons (location information, e.g., ‘variant in CDS’, transcript impact eg ‘missense variant’) and statistical inferences (e.g. variant causes NMD, SIFT and PolyPhen predictions.). Some annotations are absolute, while others have dependency upon context; e.g., the a base change is independent of context, but whether it introduces a stop codon may depend upon the transcript isoform in which exists, or whether other variations are upstream of it.
In recent years, many in silico tools have been developed to help understand variation data, these include SNPeff, Annovar, VEP and Varant. No tool has emerged as a de facto standard for interpreting genomic variation and even evaluating them is challenging, as there are no common variant annotation standards.
The goal of the GA4GH annotation task team is to create common standards for reporting variant annotation including results formats and query methods (GA4GH schemas) for different classes of annotation. The outcome will be consistent reporting in a manner that facilitates benchmarking and evaluation, and the development of increasingly sophisticated annotation.
Once implemented across all annotation tools, these common standards will be invaluable for method evaluation. Enabling cross-comparisons will highlight any systematic errors or differences in results, and analysis of these will improve the quality of variant annotation. These standards are important to drive important developments in this field.
