In 1990, researchers at Celera Genomics and at the National Human Genome Research Institute began an ambitious endeavor to sequence the entire human genome. In 2003, the project was completed, resulting in the sequencing of all human chromosomes. The Human Genome Project revealed that the human genome contains only 20,000 to 25,000 genes. This estimation, based on the sequence data, is substantially below previous predictions. The sequence data has led to the estimation that only about 1 percent of the human genome actually encodes functional proteins. Once the jigsaw puzzle is completed, the data will undoubtedly help researchers devise new diagnostics and treatments for genetic diseases.
In addition to sequencing the human genome, researchers have sequenced the genomes of Drosophila melanogaster (fruit fly), Arabidopsis thaliana (plant), Saccharomyces cerevisiae (budding yeast), and C. elegans (worm). In addition, mouse, rat, and zebrafish genomes have been sequenced. Eukaryotic organisms are also useful to the research community. The genome of Plasmodium (the organism that causes malaria) has also been sequenced. The goals of these sequencing projects are to prepare gene linkage maps and physical maps. A gene linkage map pinpoints the location of genes based on their connection to certain marker gene sequences. A physical map, in comparison, gives the actual number of bases between genes on a chromosome; therefore, it locates the gene of interest more precisely.
Ultimately, scientists hope to learn the actual names and sequences of all 3 billion nitrogenous base pairs in the human genome. Automation and computerization are essential tools in the sequencing, and the development of the specific technology is underway.