Variant Annotation Task Team
Annotating genomic variation is a critical step when using Next Generation Sequencing (NGS) in the clinic as well as to gain a deeper understanding of population genetics, evolution and biological function. In addition to a range of methods for variant annotation (automatic predictions, manual annotation, and cross-species evidence), there are different classes of annotation: factual annotation (e.g., population minor allele frequency, occurrence of an allele in a compound heterozygote), gene annotation comparisons (location information, e.g., ‘variant in CDS’, transcript impact eg ‘missense variant’) and statistical inferences (e.g. variant causes NMD, SIFT and PolyPhen predictions.). Some annotations are absolute, while others have dependency upon context; e.g., the a base change is independent of context, but whether it introduces a stop codon may depend upon the transcript isoform in which exists, or whether other variations are upstream of it.
In recent years, many in silico tools have been developed to help understand variation data, these include SNPeff, Annovar, VEP and Varant. No tool has emerged as a de facto standard for interpreting genomic variation and even evaluating them is challenging, as there are no common variant annotation standards.
The goal of the GA4GH annotation task team is to create common standards for reporting variant annotation including results formats and query methods (GA4GH schemas) for different classes of annotation. The outcome will be consistent reporting in a manner that facilitates benchmarking and evaluation, and the development of increasingly sophisticated annotation.
Areas of focus
- A document describing a standard way of representing variant annotations within VCF files:http://snpeff.sourceforge.net/VCFannotationformat_v1.0.pdf
- Use cases for variant annotation: https://github.com/ga4gh/schemas/issues/226
- A collection of complicated annotation examples. More welcome here in VCF format
- A list of some In silico tools for annotation variants
- A prototype of the GA4GH variant annotation API: http://rest.ensembl.org:8082/
- Contributions to the compliance suite and reference server implementation*
- Discussions of consistent nomenclature for describing variants The group will also discuss challenges and subtleties of genomic data that make variant annotation far from straightforward. For example, handling of multiple transcript isoforms, ambiguity in variant interpretation, challenges using variant nomenclature, annotation of regions of the genome with alternative reference representations, and multiple variants in the same haplotype.
Once implemented across all annotation tools, these common standards will be invaluable for method evaluation. Enabling cross-comparisons will highlight any systematic errors or differences in results, and analysis of these will improve the quality of variant annotation. These standards are important to drive important developments in this field.
- EMBL-European Bioinformatics Institute
- Hinxton, United Kingdom
- Manager, Data Working Group
Contact Stephen for more information or to get involved with the Data Working Group.
- Wellcome Trust Sanger Institute
- Hinxton, United Kingdom
- Coordinator, Data Working Group
Contact David for more information or to get involved with the Data Working Group.
- San Francisco, United States
To review Team membership, please visit ga4gh.org
How We Work
The Global Alliance produces high-impact deliverables to enable data sharing and catalyzes key projects to demonstrate value.
Get started by becoming an Individual Member and receive updates about ongoing work.