Project Status

The WCG computations have computed. Analysis of the resultant structures is underway.

News

April 29, 2015 - We have completed the move and the recovery of data related to rice at least and everything should be working as before.

April 10, 2015 - Crash of the servers being recovered!

January 9, 2015 - The compbio.org servers (including the rice server) are physically moving as our PI moves from the University of Washington to the State University of New York. So please bear any downtime and broken links.

January 4, 2014 - We have created a straight-forward web interface to download the top scoring models of any rice protein we modelled, along with a tar and bzipped archive of the models for all the protein sequences.

May 3, 2011 - Unfortunately, we did not get funded for the analyses of NRW. However, Ram Samudrala has received a Pioneer grant for re-purposing existing drugs http://protinfo.org/cando. We have been in the process of re-tooling our local computational cluster.

Thus NRW is still alive and kicking. We are using our updated cluster to assess the accuracy of the predictions by comparing the 1% of rice protein sequences that have at least some sequence similarity to experimentally determined structures. Comparing the predicted models created by World Community Grid volunteers with these experimentally determined structures gives us a measure of accuracy.

A paper has been published in BMC Notes regarding the GPU acceleration of the clustering process of the analysis http://www.biomedcentral.com/1756-0500/4/97

We will be applying again for funding for the NRW analyses and 1KP shortly.

May 19, 2010 - Ram Samudrala answers questions related to distributed computing as part of the "A Better Rice for the World" article by Alexander Janssen.

Apr 2, 2010 - We have begun to analyze the terabytes of results that have been generated through the generous efforts of the volunteers.

Now comes the difficult part of sifting through the data to find the best models. The folding algorithm is noise and there will be many inaccurate models. We need to find the best models from the almost 7 billion models generated. This should take approximately 3-6 months using our fastest methods. After identifying the most accurate models, we then will use the information to figure out what functions these proteins perform in the rice organism. This involves comparing the structure and sequence to known proteins and is also a time consuming process. The plant genomes are not nearly as well studied as the human and mammalian genomes which makes the process all the more difficult.

We are also developing faster and more accurate technologies to examine the data. As we have mentioned in the forums, a gpu-accelerated version of the simulation process has already been developed which is several orders of magnitude faster and more accurate. We have and are extending that technology to the analyses of the model structures. We have also developed sophisticated techniques that recognise structure and sequence patterns or signatures to identify the function of the protein.

We are applying for funding to support these and other efforts to analyze the mountain of data that has been generated during this process. We too are volunteers, and it is our hope that our combined efforts in the NRW project will help develop rice strains that will make a difference in fighting malnutrition and feeding the world’s people. Finally, as the project comes to an end, we want to thank everyone for their generous contributions to this endeavor, especially those that volunteered their computers and time to generate the data. We really appreciated it.

Tentative future plans are to resubmit an application to the IBM to apply the Protinfo algorithm to proteins encoded by 1000 plant transcriptomes generated by the 1KP Project. This work in progress. Thus the efforts of the WCG volunteers and the results of this study will have a broader impact beyond rice proteomics.

Sep 15, 2009 - Most of our efforts in the fast few months have been spent trying out to tease more domains from the rice protein/proteome to increase the size of the project. These domains have been packaged into work units and are now crunching. So we have raised the number of protein structure predictions from roughly 40,000 initially to about 65,000 when all the larger sequences have been processed. Of these, we have roughly 35,000 completed so we still have about 30,000 to go (so it looks as though we're about halfway done now).

The logic and goal here is that the more comprehensive picture of the individual protein domains in rice we have, the more we can use that to inform us about the structures of other unknown domains in rice as well as other food crops. That is, partial information is much better than zero information. This enables us to obtain a better understanding of the pathways involved at atomic level detail.

Apr 8, 2009 - We have begun to analyze the protein models generated by Nutritious Rice for the World volunteers. The next step is to use sophisticated methods to select the top protein models for each gene. This will let us focus on a more manegable number of protein structures from the billions generated so far. Rice proteins are very different from what has been previously studied and only 1% of the proteins we're working on have segments which are significantly similar to proteins of known structure. That is why computer modelling is necessary and why this project is important. It also means that we have a lot of hard work ahead of us still!

In general, when proteins have similar amino acid sequences, they also have similar structures. The small number of cases where at least part of the protein sequence is similar to one where the structure is known are thus very useful. We have a good idea what those regions of the protein structure model should look like and this allows us to optimise and validate the tools that we use to pick the best models. That is what we are currently doing. Once we finish this, we will start processing the data and publish the best structures for each gene online.

Nov 12, 2008 - We continue to receive excellent results from you! Storing the predicted structures of 100 proteins requires about 10GB of bz2 compressed files. So far we have amassed over a terabyte of this data, and there's a lot more to be done. We are in the process of making room for storing this, and adjusting our clustering code to deal with this large number of results. Stay tuned for more as this develops.

The National Science Foundation, a significant source of funding for us, and World Community Grid separately interviewed us about our research and this project. Take a look!

In addition, we've modified the project status image above to reflect progress in the form of an animation. Each frame represents a moment in time when some significant number of workunits was submitted or results were returned. It gives you an idea for how things are moving along.

August 28, 2008 - Since the project began, you've sampled a space of about 3 billion potential structures for each protein and been credited for turning in the best ones. We've received structures for 6800 proteins so far -- that's over a billion structures. There are about 40,000 proteins to generate structures for, so we've got a while to go still!

While you continue to generate these structures, we'll be looking through them in more detail using clustering techniques. This will reveal to us those structures that resemble real proteins. We've applied this iterative process at smaller scales with success, and this larger pool of data for clustering will improve the accuracy of identifying good structure predictions.

And yes, the status image has been updated. :)

May 14, 2008 - The response to this project has been fantastic. Many people are excited and asking lots of good questions on the forums. We thank you all for an enthusiastic welcome to World Community Grid!

Our project has been mentioned in the press, which you can read about in the Press section of this site.

Finally, we've designed an image to keep you updated on the technical progress of this project.

May 12, 2008 - World Community Grid volunteers began processing the first work units of the project.