O'Reilly Network    
 Published on O'Reilly Network (http://www.oreillynet.com/)
 See this if you're having trouble printing code examples

O'Reilly Emerging Technology Conference

Microsoft's Research Director Taps Top Tech Trends

by Richard Koman

Rick Rashid Gives an Insider's Glimpse at the Future

When Richard Koman went looking for the best price for a RAM upgrade for his wife's blue-and-white Mac G3 tower, he found a MacWorld article from 1999 stating that Viking had just released a 256 MB module for $1,499. The price at MacMall this week is $109; a price decrease of 13x in three years.

Hard drives are running about 33 cents per GB; it seems like it wasn't that long ago that the rule of thumb was a dollar a megabyte. Now these seemingly mundane facts may not indicate anything more to you than the unattractiveness of the disk drive industry as an investment opportunity, but to Rick Rashid they're "very exciting."

For Rashid, director of Microsoft Research, the affordable price of massive disks on desktop PCs is a critical factor in the future of technology. Read the following interview to find out why.

Rashid left Carnegie-Mellon University in 1992 to start Microsoft Research, a division of the software giant tasked with "moving forward the state of the art" and moving technology into Microsoft products. One of the major successes there is the Windows Media Group, which started out as a research project and is now a leading part of the company.

Koman asked Rashid, who will be a keynote speaker at O'Reilly's Emerging Technology conference in May, to point to the top trends in technology, as seen from the perspective of someone who's working on it years before it turns into product.

Trend #1: Storage Space

Richard Koman: Where are technology and computer science going? Can you point to three or four upcoming trends you see?

O'Reilly Emerging Technologies Conference

The 2002 O'Reilly Emerging Technologies Conference explored how P2P and Web services are coming together in a new Internet operating system.

Rick Rashid: If you look at the kinds of things that are happening right now in the hardware space and the software space, there are several areas I think are pretty important.

One thing, of course, is that the amount of storage that's available has just been mushrooming. It's growing much faster than Moore's Law. We're seeing an increase in storage size of roughly a factor of two every six months. And we're only probably two or three years at most away from individuals having a terabyte of storage on their PCs or in their laptops.

So this creates a lot of other opportunities. You can start thinking about your computer now as a place where you can store almost everything that happens to you. For example, you could literally put every conversation you've ever had, from the time you're born to the time you die, in a terabyte of disk.

So, it becomes kind of a digital memory for you -- you may not want to keep all that information and that's fine. But now you don't have to think about storage as a limiter for a lot of data that you would like to store <ETH> [such as] managing a lot of the things that happen in a small business, in your own personal life -- in an online digital form. You don't have to think about deleting things or removing things or restoring things all the time, just for the purposes of managing your storage.

Koman: I interviewed Brewster Kahle of Internet Archive recently and asked him what he thought were the impacts of having 100 terabytes of storage available. And he said you start to think about doing everything, doing it all. That AltaVista had the great breakthrough of "let's collect the whole Web, we don't need to cut anything out, we can store whatever we can get our hands on."

Rashid: Well, I think that happens for the individual too. Just think about all the things that you do, right? You'd like to be able to get that back in some cases. You don't ever know exactly when you want to get it back. I mean everybody's had the feeling of, "I just deleted something and now I need it back. Or I was just looking for something I did last year and I can't find it anymore. " There's no reason that can't just be online. It's difficult for a single human being to generate more than we'll be able to put in storage, and that opens up a lot of ways of really helping people do their jobs, helping people to organize their lives, that we didn't really have available to us before.

I think that's very exciting. And then from a software perspective it raises a lot of issues like, how do I manage all of this? How do I find the thing I really want instead of all the other stuff? How do I protect it, from a privacy perspective? There's just a lot of interesting, exciting computer science issues this raises.

Koman: Another question it raises for me is will the software be robust enough to move through so much stuff? If you never throw out an email, you reach a point where Outlook isn't really able to move through, say, 10,000 email messages.

Rashid: Sure. Because it was not architected to do that. I mean I think one of the things that you really have to do, as we move forward, as we think about this kind of change that storage is going to bring to us, is to say, we have to now think about architecting our systems and our applications without those kinds of limits.

I think you have to start thinking in terms of, what if I had a petabyte [1,024 terabytes] of information, that I needed to manage; how am I going to manage that and how am I going to make sure that my applications can get the things that they need? It opens up questions of what are the search strategies that I'm going to use? A lot of things will be there, the question is how do you find them? How do you index them? How do you manage them?

But it just opens up a lot of new core opportunities too. I mean we only have that much information. You now can correlate a cross set and then discover things that you didn't otherwise know. One of the things that one of our researchers has been looking at is ways of using the huge amount of data in the Internet, not just to answer questions by giving you back articles to read, but to answer questions by actually answering your question ... using the redundancy and using some linguistic techniques to extract the actual answer that you wanted, as opposed to just pointing you at a bunch of articles. And I think again as you collect more data, you can begin to use these data-mining techniques to effectively take advantage of the fact that there's redundancy, and that redundancy usually means something is important, so that may be the thing that you're really looking for.

Koman: How quickly are we likely to see an Internet that actually answers your question?

Rashid: Again, there's a lot of research that has to be done, but I think the results from work that some of our people and people in other places are doing are beginning to be somewhat promising there. I'd say probably in five years you can expect to have something like that -- probably not for every question -- but for a lot of basic factual questions, yes, you ought to see that in the next five years.

Trend #2: Human/Computer Interaction

Koman: So the first trend is the mushrooming of storage. What's next?

Rashid: I think another area where we're seeing things starting to change fairly rapidly is in the human/computer interface. In devices, we're suddenly starting to see much more intelligent devices able to gather a lot more information. So for example in MEMS (Micro-Electro-Mechanical Systems) technology, you can now make very tiny chips that can be gyroscopes or accelerometers or that can have the ability to access networks and do things like triangulation.

That means you can now start thinking about your pen, your shoes, as being able to collect a lot of information. So from a computer/human interface perspective, I can now just write on a piece of paper and have that information gathered up and take it into my computer in a useful way. I mean not just a bunch of scribbles, but where the computer can actually interpret that.

You're seeing things like the Tablet PC that we're bringing out with a bunch of partners later this year. We have an integration of handwriting recognition with the inking process. And inking in some sense is almost as important as anything else because what we find is people really just want to write handwritten notes and then they would like the notes to be made to look better but not necessarily be taken out of handwriting, and they want to be able to edit them as handwriting in the way they edit computer text. They want to be able to add things and remove things and add annotations and so forth.

If we integrate that kind of inking with handwriting recognition, with gesture recognition, and potentially with voice recognition, suddenly you have a whole new range of ways to think about someone interacting with a computer.

And we're getting much better as time goes on in terms of being able to manage the diverse modalities of input, and think about how they integrate with traditional forms of input. So, for example, I'd like to be able to search my handwriting in the same way I search text. And I might be able to search my voice annotations in the same way that I search text. I think we're getting to a point now where we'll be able to do that.

And even things like speech recognition -- which, historically, some people can use them while some people can't -- we'll reach a point over the next few years where we're crossing a threshold of usability. For example, with our Chinese speech recognition, many people are telling us that they can actually input speech much faster and have it be recognized than they can type.

Koman: Wow.

Rashid: Partly that's because it's really hard to type in Chinese -- you type phonetically and then the system has to recognize what character that might be, based on the sound. That's already linguistic processing, so there's an error associated with that. But people can actually speak faster in Chinese than they can type. That's not really true in English. For most of us that are halfway decent at typing, we're probably better at typing than dictating.

But still it's an indication that the technology really is getting to a point where it becomes a usable tool -- in Asia in particular, but I think it's becoming more usable in the United States and Europe as well. Although theyÍre probably more for reasons that people really can't type for whatever reason, because they're not able to use a keyboard for some reason, or the setting that they're in doesn't really admit it.

Koman: Well, there's a very real human toll in keyboarding: carpal tunnel syndrome, repetitive stress injury ...

Rashid: Unfortunately you can also say the same things about voice. There are a number of vocal cord problems that can occur when people talk too much. I've had a little bit of that myself. It's evidently quite common in people that are my age that do a lot of speaking events, so I've been given some training as to how to try to avoid overuse syndrome for my vocal cords. I guess the good news is that if we have many different modalities then perhaps you don't have to overuse one of them.

Koman: Right. Can you say anything more specific about the Tablet PC?

Rashid: It will be coming out later this year. Bill has shown it at COMDEX and CES. I think the interesting part is that it's really melding the traditional PC with the handheld, pen-based portable devices. It has a very high recognition of the pen movement. So you can write with it, it looks good, you can preserve the ink, and manage it in the same way that you manage handwritten or typed material.

But at the same time it's just a PC and you have all the things a PC has, except perhaps a keyboard, and you can always dock it with a keyboard. Some models will actually come with keyboards that are basically folded on the inside when you want to use it as a tablet, and folded on the outside when you don't. It will give people a lot of options in terms of the way they think about using their computer. We see a tremendous amount of enthusiasm among a number of different communities of users for this type of a device, and so hopefully it will do well when it comes out.

Trend #3: Graphics

Koman: So, we have storage and interfaces.

Rashid: Another area that I think is very exciting is graphics. I've always been interested in graphics and computer games. I did a computer game back when I was a graduate student in the 1970s, and even since being at Microsoft, I've done a computer game here, so it's kind of a spare-time occupation for me.

What's exciting about graphics is that it's just a tremendous change in such a short period of time. I mean over the last three years we've seen a factor of a 100 increase in the performance of real-time 3D graphics that you can put in your PC or a game console. To put that in perspective you can basically now do more triangles per second -- which is a measure of the complexity of the scenes because software represents the surface of an object with triangles. We can now do more triangles per second with a high-end PC or with something like an X-Box console than was needed to render the original Toy Story, where it was two to 12 million triangles per frame.

That means we're now able to do in real time a lot of things that historically people could only do on extraordinarily expensive equipment or over very long periods of time. It also means that we can now use computer graphics technologies to create realities that were really never there before. Like in the movie Gladiator, Rome wasn't there, the Coliseum wasn't there. Those were all just graphics. It wasn't a science fiction movie but it was a special effects movie in a very significant way.

In the game area people are able to produce things now that are beginning to approximate some notion of reality, although usually it's a skewed reality where they're trying to make it look a little bit artificial. Over the next five to 10 years, we're going to be able to put on your computer screen something that looks a lot like what you see out your picture window. We'll be able to visualize many different kinds of information in a way that is much more compelling for people. We'll be able to take everyday objects and quickly and easily bring them into your computer.

Let's say you want a 3D version of your children. But also you may want to do analysis of, say you're a doctor and you want to do an analysis of the way that someone is performing some function and be able to analyze it in detail. I think we'll be able to do that. You're already seeing some of those things happen in sports medicine and various professional settings. But I think we'll be able to bring that to a much larger number of people as time goes on.

Koman: So, do you think this could radically impact the cost of medical technology, for example?

Rashid: It appears to have had an impact, already. There are a lot of things now that you can compute instantaneously that before you had to send off to a laboratory and have it analyzed for long periods of time. And as time goes on that will change the way people think about medicine. I think it's already the case that just having huge databases of information about proteins, about various kinds of molecules and chemicals and their interactions, that's already having a big impact on things like the production of pharmaceuticals.

One of the things the pharmaceutical companies tell us is that a lot of what they're doing is computation. They're doing a tremendous amount of simulation trying to find particular molecular combinations that can match up with a particular protein. Computers allow them to create targeted drugs in a way that they were never able to do before. And I think again over the next five to 10 years, our ability to specifically target diseases or the products of diseases will continue to improve.

Koman: Right, I was going to mention genomics and proteomics.

Rashid: Oh, it's very exciting. And again a lot of this is a combination of both increases in computing power, which of course is important, and increases in storage, because now you're really talking about enormous amounts of data that have to be collected in order to be able to do those kinds of analyses. The other side of it is that our knowledge of how to do things like this in software has dramatically improved. We're not doing the same old algorithms on faster computers; people are really devising new techniques for managing enormous amounts of data and storage, new theoretical techniques for performing certain types of computation. I think again, that's a part of it.

I mean one of the research teams we have here is in data mining. And one of the key things that they're looking at is when I have these enormous petabyte databases that I want to analyze, how can I do that in a reasonable amount of time. It's very important to be able to understand how to sample a database appropriately based on kinds of queries you make into it -- how do you get the right kind of information back.

Trend #4: Distributed Computing

Koman: What about distributed computing over networks?

Rashid: I think there we're really seeing a revolution. I'm extremely excited about what's happening with XML and the notion that we can now create the self-describing databases of information and exchange data that is self-describing. That we can put on the Net descriptions of the interfaces to servers so that you can literally program against a server without initially knowing anything about it. But by simply pointing your development environment to the data that exists out there, the directory information, it pulls down the description of the description or the schema of the database of the server that you're talking to, and you can pull that into your development environment and just work with it right there. That is just a tremendous new ability. We've been building distributed computing systems for a long time and people have worked around these kinds of issues in the past. Now we're actually starting to see real commercial enterprises doing it, and large databases being put online in this form.

One of the things we did, to start my research group a few years ago, is we put one of the very first terabyte databases out, which was the Terra server, a database of imaging the surface of the Earth. Over the years we added topographical information , lots of data about the areas that we have images. We put in map information, and so forth.

Last year the researcher that originally put the site together took that site and turned it into a Web service, making it available through XML and through SOAP and all of the various protocols that allow you to sort of talk to this database as a programmatic component. Almost immediately we saw people using it. It's been used in courses that people are teaching about writing distributed applications. The USDA is using it to build applications to help farmers do soil analysis. And it's very cool just to see that happen all of a sudden.

We're now working with the National Science Foundation and Cal-Tech and Johns Hopkins University on a national virtual observatory. We're basically putting online a database looking out towards the sky, and making available to astronomers this type of information where you can both get the images but you can also program against it and access the information in a programmatic way.

And that's really exciting for everybody because it means that suddenly we can think about having in some sense on the Internet the world's best telescope that's always on -- it's just accumulating data from real telescopes, but it's there whenever anybody wants to find a bit of astronomical data. That's something that many people are working on together in the astronomy community and we're part of that.

Koman: So it seems like an additional theme is using the Internet to cull together really massive databases you can program against.

Rashid: The way I look at it is, a lot of the very early Internet systems were in fact distributed systems where you had protocols for communicating back and forth between computers for the purposes of writing software, not just for people to sort of visualize information and bring up Web pages and things of that sort. And now I think we're finally reaching a point where those kinds of ideas are really going to see fruit. We're going to see people be able to build large-scale distributed applications using the tremendous resources that exist on the Internet as a database, if you can think of it that way, for those computations.

Koman: What do you make of the peer-to-peer networking paradigm?

Rashid: Well, again -- I hate to say this, back in the old days when I was growing up ... ah, kids never really believe those stories anyway, but -- when I was doing my thesis work, we were doing peer-to-peer networking. This is an old idea. It's been around for a long time. It's called distributed computing.

I mean, we went through this period, I think, where client-server became kind of a very special case that everybody was handling particularly well for the Internet, and in corporations and enterprises as well. If you go back historically it was all peer-to-peer communication. So I think what we're seeing now is really, people getting back to that, and it takes many forms. Sometimes people talk about file sharing, that's a kind of peer-to-peer. But I think the most interesting is where you really start talking about building -- literally building systems that are survivable, that are fault-tolerant because the information's distributed, and because the applications themselves are distributed.

Koman: So do you think client-server doesn't necessarily have to be around forever?

Rashid: Well, I think, well, client-server will be around forever, because if nothing else it's a special case of peer-to-peer. It's just where you've got a big peer and a lot of little peers. I think the reality there is that we will move toward more distributed implementations. It's already the case if you look at big Web sites that they're not typically implemented by having one computer that your client actually talks to. They're often implemented that way now, both for purposes of scalability and for purposes of fault tolerance, as a distributed-computing component. Which you may think of as being a single thing, a single server, but that's not actually how they're implemented.

And I think increasingly that's just going to be pushed out to the leaf nodes, if you want to think of that way, in the network. Where itÍs not about computing in the center of the network, but it's also going to be about computing in the leaf nodes, the individual personal computers and devices that plug in. And I really think it will be much more of a truly distributed system that you'll increasingly not be able to tell where a particular computation was performed. And probably won't care. And, in fact, you'll be very happy about the fact that if I go from one device to another, the same computations appear to be performed, even though I don't really know exactly where or how.

If I've got a small personal device that's doing something for me I don't really want to know, is the device doing it or is something, somewhere in the network this information is being processed and I'm simply watching it or looking at it?

And likewise on my PC -- the distinction shouldn't be that important. It should just be a question of how do we provide computing in a most effective way for users that makes the best use of the network, that gives them the best level of reliability, of scalability, and that protects their information in the best possible way?

Koman: Uh-huh. So classes of software like the SETI@Home module, for example, you would see more of that.

Rashid: Yeah, you're definitely going to see more of that. I think increasingly you're going to see people taking advantage of the fact that there's an enormous amount of computing capacity out there and beginning to think about it as one big, gigantic, world, super computer.

Maybe not exactly like those science fiction stories ... I mean I love science fiction, but I sometimes do cringe when they show these people kind of flying through cyberspace in these kind of weird, psychedelic outfits and so forth. It may not be quite like that, but certainly you're going to see computation distributed in a way that people dreamed about back when I was earning my stripes. Now you're really seeing it begin to happen.

Koman: So, just to stay on the science fiction theme for a second, on Star Trek there was this one supercomputer on board and you just spoke out and it would do your command. It was sort of like a really mega-centralized computer.

Rashid: Yeah, well that evolved through the various on Star Trek shows over the years. It's always interesting to see how the technology of the day seems to influence the science fiction of the day.

Koman: Right. That was sort of the Sixties ...

Rashid: And by the time you got to things like Star Trek, The Next Generation, they were already having nanotechnology, nanobots, and distributed intelligence. It's just interesting to see how things changed as time went on.

The original Star Trek series, I think, reflected people's notions of computing of that particular day which was large, big, single computers that were so massive and so expensive that you couldn't have more than one of them. Although they did introduce the concept of the tablet PC, I think. They all had these sort of computing tablets they would draw on or make their notes on or talk through. And we're only now getting around to actually building those things.

Koman: Well, I guess this was a tangent.

Rashid: It's a great tangent. I mean I was a big Star Trek fan. Whenever a new Star Trek movie would come out I'd always take the people that worked for me out to see it, so I have a personal financial stake in the quality of each Star Trek movie.

Koman: A lot of disappointments. So, will we see a Holodeck in our lifetimes?

Rashid: Oh gosh, when are you going to see a Holodeck? Well, ah, the Holodeck was very complicated in Star Trek. There were many different technologies associated with that. So it's going to be hard to produce exactly what they have.

When are we going to be able to create something that looks on a computer screen as though it's completely real? I think that's probably within the next five years, maybe 10. I might be off, but we're moving very rapidly in that direction.

Don't miss the O'Reilly Emerging Technology Conference, May 13-16, in Santa Clara, Calif. This year, we'll explore how P2P and Web services are coming together in a new Internet operating system. Register by March 22 and save up to $695.

Copyright © 2009 O'Reilly Media, Inc.