Computational biology is the science that answers the question “How can we learn and use models of biological systems constructed from experimental measurements?” These models may describe what biological tasks are carried out by particular nucleic acid or peptide sequences, which gene (or genes) when expressed produce a particular phenotype or behavior, what sequence of changes in gene or protein expression or localization lead to a particular disease, and how changes in cell organization influence cell behavior. This field is sometimes referred to as bioinformatics, but many scientists use the latter term to describe the field that answers the question “How can I efficiently store, annotate, search and compare information from biological measurements and observations?” (This subject has been discussed previously by an early NIH task force report and by Raul Isea.)
A number of factors contribute to the confusion between the terms, including the fact that one of the top journals in computational biology is entitled “Bioinformatics” and that in German for example, computer science is referred to as “informatik” and computational biology is referred to as “bioinformatik.” Some also feel that bioinformatics emphasizes the information flow in biology. In any case, the two fields are closely linked, since “bioinformatics” systems typically are needed to provide data to “computational biology” systems that create models, and the results of those models are often returned for storage in “bioinformatics” databases.
Computational biology is a very broad discipline, in that it seeks to build models for diverse types of experimental data (e.g., concentrations, sequences, images, etc.) and biological systems (e.g., molecules, cells, tissues, organs, etc.), and that it uses methods from a wide range of mathematical and computational fields (e.g., complexity theory, algorithmics, machine learning, robotics, etc.).
Perhaps the most important task that computational biologists carry out (and that training in computational biology should equip prospective computational biologists to do) is to frame biomedical problems as computational problems. This often means looking at a biological system in a new way, challenging current assumptions or theories about the relationships between parts of the system, or integrating different sources of information to make a more comprehensive model than had been attempted before. In this context, it is worth noting that the primary goal need not be to increase human understanding of the system; even small biological systems can be sufficiently complex that scientists cannot fully comprehend or predict their properties. Thus the goal can be the creation of the model itself; the model should account for as much currently available experimental data as possible. Note that this does not mean that the model has been proven, even if the model makes one or more correct predictions about new experiments. With the exception of very restricted cases, it is not possible to prove that a model is correct, only to disprove it and then improve it by modifying it to incorporate the new results.