Annotating genomic variation is a critical step when using Next Generation Sequencing (NGS) in the clinic as well as to gain a deeper understanding of population genetics, evolution and biological function. In addition to a range of methods for variant annotation (automatic predictions, manual annotation, and cross-species evidence), there are different classes of annotation: factual annotation (e.g., population minor allele frequency, occurrence of an allele in a compound heterozygote), gene annotation comparisons (location information, e.g., ‘variant in CDS’, transcript impact eg ‘missense variant’) and statistical inferences (e.g. variant causes NMD, SIFT and PolyPhen predictions.). Some annotations are absolute, while others have dependency upon context; e.g., the a base change is independent of context, but whether it introduces a stop codon may depend upon the transcript isoform in which exists, or whether other variations are upstream of it.
In recent years, many in silico tools have been developed to help understand variation data, these include SNPeff, Annovar, VEP and Varant. No tool has emerged as a de facto standard for interpreting genomic variation and even evaluating them is challenging, as there are no common variant annotation standards.
The goal of the GA4GH annotation task team is to create common standards for reporting variant annotation including results formats and query methods (GA4GH schemas) for different classes of annotation. The outcome will be consistent reporting in a manner that facilitates benchmarking and evaluation, and the development of increasingly sophisticated annotation.
Areas of focus
- A document describing a standard way of representing variant annotations within VCF files: http://snpeff.sourceforge.net/VCFannotationformat_v1.0.pdf
- Use cases for variant annotation: https://github.com/ga4gh/schemas/issues/226
- A collection of complicated annotation examples. More welcome here in VCF format
- A list of some In silico tools for annotation variants
- A prototype of the GA4GH variant annotation API: http://rest.ensembl.org:8082/
- Contributions to the compliance suite and reference server implementation*
- Discussions of consistent nomenclature for describing variants The group will also discuss challenges and subtleties of genomic data that make variant annotation far from straightforward. For example, handling of multiple transcript isoforms, ambiguity in variant interpretation, challenges using variant nomenclature, annotation of regions of the genome with alternative reference representations, and multiple variants in the same haplotype.
Once implemented across all annotation tools, these common standards will be invaluable for method evaluation. Enabling cross-comparisons will highlight any systematic errors or differences in results, and analysis of these will improve the quality of variant annotation. These standards are important to drive important developments in this field.
- EMBL-European Bioinformatics Institute
- Hinxton, United Kingdom
- Manager, Data Working Group
Contact Stephen for more information or to get involved with the Data Working Group.
- Wellcome Trust Sanger Institute
- Hinxton, United Kingdom
- Coordinator, Data Working Group
Contact David for more information or to get involved with the Data Working Group.
- Stephen Brenner , University of California Berkeley , Berkeley, CA, United States
- Fiona Cunningham , EMBL-European Bioinformatics Institute , Hinxton, United Kingdom
- Reece Hart , Invitae , San Francisco, United States
- Lee Lichtenstein , Broad Institute , Cambridge, United States
- Vartika Agrawal , Philips , Amsterdam, Netherlands
- CH Albach , Google Inc. , San Francisco, United States
- Aparna Chhibber , Bina Technologies Inc. , Redwood City, United States
- Deanna Church , 10x Genomics , Pleasanton, United States
- Pablo Cingolani , KEW, Inc. , Cambridge, United States
- Mark Diekhans , UC Santa Cruz , Santa Cruz, United States
- Karen Eilbeck , University of Utah , Salt Lake City, United States
- Steven Hart , Mayo Clinic , Minnesota, United States
- David Haussler , UC Santa Cruz , Santa Cruz, United States
- Angie Hinriches , UC Santa Cruz , Santa Cruz, United States
- Sarah Hunt , EMBL-European Bioinformatics Institute , Hinxton, United Kingdom
- Andrew Jesaitis , Georgia Institute of Technology , Atlanta, United States
- Rachel Karchin , Johns Hopkins University School of Medicine , Baltimore, United States
- Rick Kim , In Silico Solutions , Falls Church, United States
- Uri Laserson , Maine School of Science and Mathematics , Limestone, United States
- Shirley Li , MolecularMatch , Houston, United States
- Daniel MacArthur , Massachusetts General Hospital , Boston, United States
- Elliott Margulies , Illumina, Inc. , Camrbdige, United Kingdom
- Davis McCarthy , Wellcome Trust , London, United Kingdom
- Will McLaren , EMBL-European Bioinformatics Institute , Hinxton, United Kingdom
- Justin Paschall , EMBL-European Bioinformatics Institute , Hinxton, United Kingdom
- Nazneen Rahman , Institute of Cancer Research , London, United Kingdom
- Alex Romos , Broad Institute , Cambridge, United States
- Gabe Rudy , Golden Helix , Bozeman, United States
- Jaclyn Smith , Oregon Health & Science University , Portland, United States
- David Steinberg , UC Santa Cruz , Santa Cruz, United States
- Rebecca Truty , Complete Genomics , Mountain View, United States
- Justin Zook , National Institute of Standards and Technology , Gaithersburg, United States