Functional Annotation with Blast2GO: A bioinformatics platform for This tutorial shows how to assign subcellular localizations with PSORTb in Blast2GO. Blast2GO allows the functional annotation of (novel) sequences and the These steps will be described in this manual including further explanations and. Blast2GO Plugin User Manual. For CLC bio Genomics Workbench and Main Workbench. Version 1, Feb. BioBam Bioinformatics S.L.. Valencia, Spain.
|Genre:||Health and Food|
|Published (Last):||27 June 2012|
|PDF File Size:||7.8 Mb|
|ePub File Size:||2.76 Mb|
|Price:||Free* [*Free Regsitration Required]|
Functional annotation of novel sequence data is a primary requirement for the utilization of functional genomics approaches in plant research. In this paper, we describe the Blast2GO suite as a comprehensive bioinformatics tool for functional annotation of sequences and data mining on the resulting annotations, primarily based on the gene ontology GO vocabulary. Blast2GO optimizes function transfer from homologous sequences through an elaborate algorithm that considers similarity, the extension of the homology, the database of choice, the GO hierarchy, and the quality of the original annotations.
The tool includes numerous functions for the visualization, management, and statistical analysis of annotation results, including gene set enrichment analysis. Blast2GO is a suitable tool for plant genomics research because of its versatility, easy installation, and friendly use.
Functional genomics research has expanded enormously in the last decade and particularly the plant biology research community has extensively included functional genomics approaches in their recent research proposals. The number of Affymetrix plant GeneChips, for example, has doubled in the last two years [ 1 ] and extensive international genomics consortia blats2go for major crops see last PAG Conference reports for an updated impression on current plant genomics, http: Not less importantly, many middle-sized research groups are also setting up plant EST projects and producing custom microarray platforms [ 2 ].
This massive generation of plant sequence data and rapid spread of functional genomics technologies among plant research labs has created blsat2go strong demand for bioinformatics resources adapted to vegetative species.
Functional annotation of novel plant DNA sequences is probably one of the top requirements in plant functional genomics as this holds, to a great extent, the key to the biological interpretation of experimental results. Controlled vocabularies have manuall along the way as the strategy of choice for the effective annotation of the function of gene products. The use of controlled blawt2go greatly facilitates the exchange of biological knowledge and the benefit from computational resources that manage this knowledge.
The gene ontology GO, http: Many bioinformatics tools and methods have been developed to assist in the assignment mnual functional terms to gene products reviewed in [ 8 ]. Fewer resources, however, are available when it comes to the large-scale functional annotation of novel sequence data of nonmodel species, as would be specifically required in many plant functional genomics projects.
Additionally, functional manaul capabilities are usually incorporated in EST analysis pipelines. These resources are valuable tools for the assignment of functional terms to uncharacterized sequences but usually lack high-throughput and data mining capabilities, in the first case, or provide automatic solutions without much user interactivity, in the second.
The philosophy behind B2G development was the creation of an extensive, user-friendly, and research-oriented framework for large-scale function assignments. The main application domain of the tool is the functional genomics of nonmodel organisms and it is primarily intended to support research in experimental labs where bioinformatics support may not be strong.
Since its release in September [ 20 ], more than labs worldwide have become B2G users and the application has been referenced in over thirty peer-reviewed publications www. Although B2G has a broad species application scope, the project originated in a crop genomics research environment and there is quite some accumulated experience in the use of B2G in plants, which includes maize, tobacco, citrus, Soybean, grape, or tomato. Projects range from functional assignments of ESTs [ 21 — 24 ] to GO term annotation of custom or commercial plant microarrays [ 2526 ], functional profiling studies [ 27 — 29 ], and functional characterization of specific plant gene families [ 3031 ].
In the following sections we will explain more extensively the concepts behind Blast2GO.
We will describe in detail main functionalities of the application and show a use case that illustrates the applicability of B2G to plant functional genomics research. Four main driving concepts form the foundation of the Blast2GO software: The target users of Blast2GO are biology researchers working on functional genomics projects in labs where strong bioinformatics support is not necessarily present.
Therefore, the application has been conceived to be easy to install, to have minimal setup and maintenance requirements, and to offer an intuitive user interface.
B2G has been implemented as a multiplatform Java desktop application made accessible by Java Webstart technology. This solution employs the higher versatility of a locally running application while assuring automatic updates provided that an internet connection is available. This implementation has proven to work very efficiently in the fast transfer to users of new functionalities and for bug fixes. Furthermore, access to data in B2G is reinforced by graphical parameters that on one hand allow the easy identification and selection of sequences at various blast2yo of the annotation process and, on the other hand, permit the joint visualization of annotation results and highlighting of most relevant features.
Blast2GO strives to be the application of choice for the annotation of novel sequences in functional genomics projects where thousands of fragments need to be characterized.
In principle, B2G accepts any amount of records within the memory resources mnual the user’s work station. During the annotation process, intermediate results can be accessed and modified by the user if desired. Functional annotation in Blast2GO is based on homology transfer.
Within this framework, the actual annotation procedure is configurable and permits the design of different annotation strategies. Blast2GO annotation parameters include the choice of search database, the strength and number of blast results, the extension of the query-hit match, the quality of the transferred annotations, and the inclusion maunal motif annotation.
Data mining on annotation results. Blast2GO is not a mere generator of functional annotations. The application includes a wide range of statistical and manuzl functions for the evaluation of the annotation procedure and the final results. Especially, relative abundance of functional terms can be blat2go assessed and visualized. The first release of B2G covered basic application functionalities: Enhanced modules for massive blast, modification of annotation intensity, curation, additional vocabularies, high-performing customizable graphs and pathway charts, data mining and manuual handling, as well as a wide array of input and output formats have been incorporated into the Blast2GO suite.
Figure 1 shows the basic components of the Blast2GO suite. Functional assignments blst2go through an elaborate annotation procedure that comprises a central strategy plus refinement functions.
Next, visualization and data mining engines permit exploiting the annotation results to gain functional knowledge. Schematic representation of Blast2GO application. GO annotations are generated through a 3-step process: Additional annotation data-mining tools blat2go statistical charts and blast2bo set enrichment analysis functions. The Blast2GO annotation procedure consists of three main steps: Once GO terms have been gathered, additional functionalities enable processing and modification of annotation results.
The first step in B2G is to find sequences similar to a query set by blast [ 32 ].
Blast2GO: A Comprehensive Suite for Functional Analysis in Plant Genomics
B2G accepts nucleotide and protein sequences in FASTA format and supports the four basic blast programs blastx, blastp, blastn, and tblastx. Homology searches can be launched against public databases such as the NCBI nr using a query-friendly version of blast QBlast.
This is the default option and blasg2go this case, no additional installations are needed. Alternatively, blast can be run locally against a proprietary FASTA-formatted database, which requires a working www-blast installation. The Make Filtered Blast-GO-BD function in the Tools menu allows the creation of customized databases containing only GO-annotated entries, which can be used in combination with the local blast option.
Other configurable parameters at the blast step are the expectation value e -value threshold, the number of retrieved hits, and the minimal blxst2go length hsp length which permits the exclusion of hits with short, low e -value matches from the sources of functional terms.
Annotation, however, will ultimately be based on sequence similarity levels as similarity percentages are independent of database size and more intuitive than e -values. Blast2GO parses blast results and presents the information for each sequence in table format. Mapping is the process of retrieving GO terms associated to the hits obtained after a blast search.
B2G performs nanual different mappings as follows. Identified gene names are searched in the species-specific entries of the gene product table of the GO database. This is the process of assigning functional terms to query sequences from the pool of GO terms gathered in the mapping step. Function assignment is based on the gene ontology vocabulary.
Blast2GO: A Comprehensive Suite for Functional Analysis in Plant Genomics
The B2G annotation algorithm takes into consideration the similarity between query and hit sequences, the quality of the source of GO assignments, and the structure of the GO DAG. The AS is composed of two terms. The first, direct term DTrepresents the highest similarity value among the hit sequences bearing this GO term, weighted by a factor corresponding to its evidence code EC.
ECs vary from experimental evidence, such as inferred by direct assay IDA to unsupervised assignments such as inferred by electronic annotation IEA. The second term AT of the annotation rule introduces the possibility of abstraction into the annotation algorithm.
Abstraction is defined as the annotation to a parent node when several child nodes are present in the GO candidate pool. This term multiplies the number of total GOs unified at the node by a user-defined factor or GO weight GOw that controls the possibility and strength of abstraction.
When all ECw’s are set to 1 no EC control and the GOw is set to 0 no abstraction is possiblethe annotation score of a given GO term equals the highest similarity value among the blast hits annotated with that term.
If the ECw is smaller than one, the DT decreases and higher query-hit similarities are required to surpass the annotation threshold. If the GOw is not equal to zero, the AT becomes contributing and the annotation of a parent node is possible if multiple child nodes coexist that do not reach the annotation cutoff.
Default values of B2G annotation parameters were chosen to optimize the ratio between annotation coverage and annotation accuracy [ 20 ]. Finally, the AR selects the lowest terms per branch that exceed a user-defined threshold. The annotation step in B2G can be further adjusted by setting additional filters to the hit sequences considered as annotation source.
A lower limit can be set at the e-value parameter to ensure a minimum confidence at the level of homology. This parameter is of importance to prevent potential function transfer from nonmatching sequence regions of modular proteins. Additionally, the minimal hsp length required at the blast step permits control of the length of the matching region. Blast2GO includes different functionalities to complete and modify the annotations obtained through the above-defined procedure.
Enzyme codes and KEGG pathway annotations are generated from the direct mapping of GO terms to their enzyme code equivalents. B2G launches sequence queries in batch, and recovers, parses, and uploads InterPro results. In this process, B2G ensures that only the lowest term per branch remains in the final annotation set, removing possible parent-child relationships originating from the merging action.
Blast2GO incorporates three additional functionalities for the refinement of annotation results. Firstly, the Annex function allows annotation augmentation through the Second Layer concept developed by The Norwegian University of Science and Technology http: Basically, the Second Layer database is a collection of manually curated univocal relationships between GO terms from the different GO categories that permits the inference of biological process and cellular component terms from bllast2go function annotations.
Secondly, annotation results can be summarized through GOSlim mapping. GOSlim consists of a subset of the gene ontology vocabulary encompassing key ontological terms and a mapping function between the full GO and the GOSlim. Different GOSlim mappings are available, adapted to specific biological domains. Thirdly, the manual curation function means that the user has the possibility of editing annotation results and manually modifying Amnual terms and sequence descriptors.
One aspect of the uniqueness of the Blast2GO software is the availability of a wide array of functions to monitor, evaluate, and visualize the annotation process and results. The purpose of these functions is to help understand how functional annotation proceeds and to optimize performance.
Summary statistics charts are generated after each of the annotation steps. Distribution plots for e-value and similarity within blast results give an idea of the degree of homology that query sequences have in the searched database. Once mapping has been completed, the user can check the distribution of evidence codes in the recovered GO terms and the original database sources msnual annotations.
These charts give an indication of suitable values for B2G annotation parameters. For example, when a good overall level of sequence similarity is obtained for the dataset, the default annotation cutoff value could be raised to improve annotation accuracy.