CroGO: Identifying Cross-category Relations in Gene Ontology and Constructing Genome-specific Term Association Networks
MSU-DOE Plant Research Lab, Michigan State University
Gene Ontology (GO) has been widely used in biological databases, annotation projects, and
computational analyses. Although three GO categories are structured as independent ontologies, the biological
relationships across the categories are not negligible for biological reasoning and knowledge integration. However,
the existing cross-category ontology term similarity measures are either developed by utilizing the GO data only
or based on manually curated term name similarities, ignoring the fact that GO is evolving quickly and the gene
annotations are far from complete.
In this project, we introduce a new cross-category similarity measurement called CroGO by incorporating
genome-specific gene co-function network data. The performance study showed that our measurement
outperforms the existing algorithms. We also generated genome-specific term association networks for yeast and
human. An enrichment based test showed our networks are better than those generated by the other measures.
Conclusions: The genome-specific term association networks constructed using CroGO provided a platform
to enable a more consistent use of GO. In the networks, the frequently occurred MF-centered hub indicates that
a molecular function may be shared by different genes in multiple biological processes, or a set of genes with the
same functions may participate in distinct biological processes. And common subgraphs in multiple organisms
also revealed conserved GO term relationships.
A mini-example is shown in Figure 1.
Genome-specific MF-BP Association Networks
Network G_yeast has 613 MF terms, 843 BP terms and 1,485 edges between them. As shown in Figure in below, the yeast association network consists of many small disconnected graphs.
Network G_human has 1,209 MF terms, 2,250 BP terms and 5,138 edges between them, among which 1,583 edges are between terms
that have no overlap on their annotated genes.
To measure the similarity between the terms in different GO categories, CroGO has three steps. First, the
association between two sets of genes that are annotated to any two given GO terms is calculated. Second,
the gene annotations and gene set associations are integrated to calculate the pair-wise term similarity.
Third, the directions of all the pair-wise term relationships are inferred with a GO structure based approach.
please refer to the paper for the detailed method.
We compared the performance of CroGO with the existing measures with confirmed biological knowledge on a
small gold-standard set based on the known reaction-to-pathway relationships in yeast. We calculated
pair-wise term similarities for the term pairs in the gold-standard set and the term pairs
in the random set using CroGO, and compared its performance with the ASR and VSM based measures by
drawing a receiver operating characteristic (ROC) curve for each measure. The ROC curves in Figure
2 showed clearly that CroGO has the best performance. When the false positive
threshold is 15%, the true positive rate of CroGO is 88%, while the true positive rates of the ASR and
VSM based measures are both 83%. This analysis also showed that 102 more MF-BP pairs were recognized
by CroGO than the ASR and VSM based measures when the number of true positives equals the number of
false positives. This indicates that by incorporating the co-function network, CroGO has produced better
coverage than the other measures by recognizing more gene associations between genes which are annotated
to the gold-standard connected GO terms. In addition, the same experiments were applied on human data,
and the results is consistent to the yeast data.
CroGO User Manual
CroGO was developed on a Windows 7 (x64) computer, implemented upon JDK 1.6 and JUNG library. The JAR file CroGO.jar is platform independent
tested on Windows 7 and CentOS release 5.4).
To run the JAR file, a user must prepare input files and place them and the JUNG library files in the same folder as the JAR file. The sample
input files (for human and yeast) are provided in the “data” folder, which include all the data needed in our experiments; the JUNG library
files are in the “lib” folder.
Usage: to compute cross-category GO term-term similarities, run the following command in a command line:
java -jar CroGO.jar < organism name > < MFtermID > < BPtermID >
Where “organism name” is either “yeast” or ”human”, “MFtermID” is the term ID of a MF term, and “BPtermID” is the term ID of a BP term.
The output of the program is:
Similarity = < similarity score >
Where “similarity score” is the cross-category GO term-term similarity score.
Example: to calculate the similarity between MF term GO: 0004652 and BP term GO:0043629 based on yeast co-function network, the command is:
java -jar CroGO.jar yeast 0004652 0043629
The output is:
Similarity = 0.9970899415729219
Package: CroGO.jar. The main package to compute cross-category term-term similarity
Supporting Package: JUNG library. A software library that provides a common and extendible language
for the modeling, analysis, and visualization of data that can be represented as a graph or network.
Experiment data : CroGOdata.zip. All the sample data used in our experiments.
MF-BP networks : Yeast network and Human network
How to cite CroGO
If you use CroGO or the MF-BP networks generated by CroGO, please cite:
Peng J, Chen J and Wang Y, Identifying Cross-category Relations in Gene Ontology and Constructing Genome-specific Term Association Networks. BMC Bioinformatics (special issue for selected papers presented at the 11th Asia-Pacific Bioinformatics Conference) 2012
If you have any questions, please contact Jiajie Peng via firstname.lastname@example.org.