Engineering News
November 10 , 2003, Vol. 74, No. 12F

PERFECT RECALL: CS Ph.D. student Sean Rhea is working to fine tune the query function of a peer-to-peer backup file sharing system that is currently under construction at Berkeley.

CS Ph.D. student may make peer-to- peer storage and backup a reality

Though it may sound like it, “Perfect Recall” is not the title of Governor-elect Schwarzenegger’s next movie. Rather it’s the thesis work of CS Ph.D. student Sean Rhea. When his work is done, people will be able to use a peer-to-peer file sharing network like Gnutella to backup their work.

Rhea’s research is part of the OceanStore project, which is dedicated to building a global file system that runs on a peer-to-peer network.

The idea is to plug into a network of computers and send encrypted files out into the network to be saved on someone else’s hard drive. The system would connect computers the way the Friendster Web site connects people. Friendster connects friends of friends of friends, so you only have to know one person to get access to a network of thousands.

“With Friendster you don’t have to know a lot of people to meet a lot of people, and it’s the same with this computer network,” says Rhea.

With the OceanStore system, as long as you have the IP address of just one computer you’ll be connected to every computer it is connected to, and so on and so on. This creates an enormous community of computers that can swap and store information. Many technical glitches must be ironed out before the system will function smoothly.

Right now peer-to-peer file sharing systems are often used to share music, movies, and animation. Recall is a metric of the quality of a search. A system with high recall is one that can successfully find all the files users are sharing. Most peer-to-peer networks only have high recall for popular items. Rhea is working on creating a recall so high it’s almost perfect, and would be able to search out items that only one user would want. For peer-to-peer backup, recall is much more important than it is in simple file sharing.

“In 2001, the Distributed Hash Table (DHT) was invented. A DHT is a peer-to-peer network with theoretically very high recall and Bamboo, my latest work, is an actual implementation of a DHT that demonstrates that such high recall is achievable in practice,” says Rhea.

With perfect recall, files can be stored randomly on far-away machines to guard against correlated failure.
“If you place a file on five different computers in the same building and the building burns down you lose all five. We place files randomly to minimize that risk,” says Rhea.

To retrieve files, users must query the network. To help them find their files, Rhea’s Bamboo leaves digital breadcrumbs. He says there is a fine balance between leaving enough digital breadcrumbs to make the file easier to find, but not so many that most of the storage space in the system is taken up by breadcrumbs.

“Also, people sometimes leave the system, or just turn their computers off at night, and when they do, they take breadcrumbs with them. The system has to notice, and add more breadcrumbs to make up for this loss,” he explains.

Rhea’s work is just one intricate component of OceanStore. But once the system is operational, it will make backing up your files as safe and easy as downloading a song.

For more go to oceanstore.cs.berkeley.edu/


College of Engineering Home Page

Send comments to editnews@coe.berkeley.edu   © 2003 UC Regents