Jim Kent was a graduate student in biology at the University of California, Santa Cruz (UCSC), when he wrote the program that allowed the public human genome team to assemble its fragments just before Celera's private commercial effort. His program ensured that the human genome data would remain in the public domain. Kent wrote the 10,000-line program in a month because he didn't want to see the genome data locked up by commercial patents.
Kent will accept the annual Bioinormatics.Org award in a plenary session at O'Reilly's upcoming Bioinformatics Technology Conference, February 3-6, in San Diego. Here we talk to Kent about the future of genomics, modeling biological processes, and open source bioinformatics.
Stewart: With the completion of the human genome project, how do you think bioinformatics as a discipline (or industry) has changed?
Kent: For me at least things are getting calmer and more thoughtful. The big public/Celera race is over. We're not rushing to get a rough assembly of 85% of the genome, but figuring out how best to get a very fine assembly of 98% of it. Dealing with the sheer volume of the data is not the struggle it once was, partly because now we have experience with the volume, partly because computers and hard disks have continued to get faster and cheaper, and partly because the homology searching algorithms, which used to be the bottleneck, are so much faster now.
There is still quite a bit of useful data that can and should be mass produced: genome wide microarray data, cellular localization of all genes, a dozen animal genomes for comparative genomics, immunoprecipitations with each gene followed by mass spec to see what interacts with what, etc. Still, I think soon the most important advances are going to come out of the cottages rather than the factories. That is, I think the advances are going to come out of small groups of people and individuals thinking about information that's at anybody's fingertips over the web, doing relatively small scale directed experiments on top of that. Of course if the advance is a new medical treatment, you'll still need a small army and deep pockets to get it through the FDA. This is better than not having thorough testing of drugs though.
Stewart: What is your view on establishing ontologies in bioinformatics/genomics?
Kent: I'm relying on Michael Ashburner's Gene Ontology group for that. They tend to be very descriptive with their names, not just three letters and a number, thank goodness. Like operating systems, the most useful number of ontologies is 1. My own mind tends more to chaotic nets than elegant hierarchies so I'm happy to leave the development of the ontology to the folks at EBI. I do plan to support it to the extent that I can. We've got some plans on the drawing board for incorporating it in a "protein browser" for example.
Stewart: Several research and industry groups are developing methods for simulating biological process -- modeling a cell, systems biology, etc. How do you feel about this work? Is it viable? Do we have enough genomic information to simulate an organism?
Kent: You need to be very careful at what level you model. I joke with people about trying to learn French by dissecting the brain of a Frenchman. Maybe it's possible to do. I have my doubts. There are certainly easier ways to learn French.
You can view the cell as a set of partial differential equations in 30,000 variables (one for each gene) being driven by receptors and the like. This really makes my brain hurt! I take comfort in the fact that there are only perhaps 300 different types of cells. You could think of these cell types as attractors in the dynamical system of gene expression. I very much want to figure out how to take one type of cell into another type of cell. I think this is will be the key to many of the medical advances of the 21st century. I think I'm a lot more likely to figure this out by studying developmental biology and doing tissue culture experiments than by computer simulation. However, there's a lot of uses for bioinformatics in studying developmental biology.
Stewart: How will systems biology affect the future direction of bioinformatics?
Kent: Extensive experiments with large clusters of computers in the presence and absence of biologically-based system administrators suggests that biological systems are required to keep bioinformatics systems running.
Ah, you said "systems biology" not "biology systems". I have sadly only the vaguest idea what systems biology means. Leroy Hood has an institute devoted to systems biology. I know he would like to be able to take an easily grown cell like a fibroblast and convince it to become a islet cell capable of naturally secreting insulin after a meal. It's one of my dreams too. We won't be able to do this without understanding the biological system. This understanding is going to require a lot of new computer programs among many other things no doubt, so it will be a good thing for us bioinformaticians.
Stewart: Can you compare Bioinformatics in 2002 with AI in the 1980s and '90s?
Kent: They are very similar in a lot of ways. A lot of very interesting research was done in academic environments. The commercial ventures frequently raised money from the stock market, but rarely managed to raise money from sales. SciFi writers and readers had a lot of fun speculating where it would lead. The nightmare scenarios for bioinformatics and AI are surprisingly similar: creating a new form of entity that ultimately displaces us.
There are some important differences as well though. At a very high level AI is about understanding how the mind works while bioinformatics is about understanding how the body works. The practical benefits of AI are getting machines to do tasks that requires brain power but not enough people are willing and able to do. The practical benefits of bioinformatics are largely about curing diseases.
A paradox of AI is that as soon as a machine is able to do something, we decide that doing it really doesn't require "intelligence" after all. A great chess program was one of the early holy grails of AI. We have one now that can beat the human world champion, but nobody thinks of a chess program as intelligent any more. Speech recognition is a really critical AI contribution that is now well along to becoming a reality and not just in voice mail mazes. It's curious that some of the most important speech recognition technologies, HMMs, also have broad application in bioinformatics.
Stewart: What is your view on open source in bioinformatics?
Kent: Yea! Go. The genome is hard enough to decompile. Don't make me have to decompile your source as well.
On the other hand, to be totally honest, if a tool works well, is well documented, and the people maintaining the tool make regular improvements in it in response to user requests, I would very likely never get around to looking at the source anyway.
I think that source that is developed from government and charity grants should be not only open, but freely distributable, modifiable, etc. Not even "copyleft". This applies to most academics most of the time. Since academics typically want to go onto new research rather than supporting what they've already written, many aspects of the standard open source model work well for them.
I'm not sure if open source works so well in commercial environments. These days at least, the GPL is absolutely disastrous, since it really doesn't mix peacefully with existing proprietary stuff. I want companies to be free to add value to their proprietary stuff with things generated by public funds. I want companies to occasionally give away useful source as a public service, and never, never to keep source locked up more than 17 years. I also want Walt Disney to free Mickey!
I think that open source for computer operating systems is particularly beneficial. We all want to speak the same language so that we can communicate with each other. We all want to use the same operating system so that we can use our favorite programs everywhere, and so that we don't have to port the programs we've written. An operating system in private hands will therefore tend to die out, become a monopoly, or encourage the people who use it to become a closed group. The pro bono work of the many people developing open source operating systems really is helping lift the world out of these three gloomy alternatives. Displacing Microsoft in the Windows/Icon/Mouse/Pointer world may not happen, but I would not be in the least surprised if a computer with good out of the box speech recognition, a nice web browser, and a strictly optional keyboard based on Linux takes over China in 3 years and the USA in 6.
Bruce Stewart is a freelance technology writer and editor.
Return to the Bioinformatics Resource Center
Copyright © 2009 O'Reilly Media, Inc.