Genetic research is increasingly turning to studies of sequence variation in genes encoding proteins of known structure and function. The principal question in these studies is whether sequence variation affects protein structure or function, and, for certain genes, whether sequence variation affects human health. The proliferation of published sequence data and the growth in the number of publications is a boon to this research, but also makes it difficult to keep track of what is known about a gene. The primary sequence databases of the International Nucleic Acid Sequence Data Library (for example, GenBank) provide powerful sequence-similarity search tools that help researchers deduce the functions of newly identified proteins. However, they do not contain the annotation required to map sequence variation within a single gene, or to correlate such variation with data about the gene's product or the phenotype arising from variation in the gene.
Sequence variation is particularly relevant to infectious pathogens that mutate in response to antimicrobial therapy. Sequence variations in human immunodeficiency virus (HIV)-1 reverse transcriptase (RT) and protease, the molecular targets of anti-retroviral drug therapy, are prime examples of genes in which sequence variation has both biological and medical implications. Although HIV-1-infected individuals with drug-susceptible HIV-1 isolates experience substantial reductions in morbidity and mortality with appropriate anti-retroviral drug therapy, individuals infected with drug-resistant isolates generally do not respond to drug therapy.