Computational analysis of novel drug opportunities (CANDO) results

This page is being developed.

Prospective predictions

List of indications to which our CANDO methodology has been applied thusfar for which we have made prospective predictions, performed detailed analyses, and are in the process of being verified by bench and clinical experiments. It represents a subset of indications from our more comprehensive collaborations pipeline.

Benchmarking

The benchmarking of individual components (structure prediction, binding site prediction and analysis, docking, potential functions) used in the CANDO pipeline extensively and published by us and by others.

The CANDO pipeline has generated matrices of 3733 compounds vs. 42,223 protein structure interactions. The vast majority of the protein structures are from the PDB and the rest are modelled to high confidence. The utility of the compound-protein matrix generated by the CANDO pipeline has been preliminary benchmarked using a leave-one-out approach for all indications with two or more FDA approved drugs (compounds). In this in virtuale experiment, for every indication with two more compounds, each of the compounds is compared ranked by similarity to all the other compounds in our library. The rank of the most similar compound with the same indication is used to determine the performance of our approach. The lower the rank, the better the performance. As a control, fully randomised CANDO matrices (where each of the rows and columns, representing compounds and proteins respectively, are moved to a random selected location) are used to perform the same benchmarking analysis to determine the type of results that could be obtained by pure chance.

The following is a brief summary of the benchmarking done thusfar. They indicate that the real CANDO matrices enrich predictive ability and indicate value of large scale wholistic approach to enrich signal for individual protein-compound interaction prediction. This compound-centric protocol can be carried out automatically on any indication with at least one approved compound simultaneously predicting cures/treatments for hundreds of indications. Indication specific protocols are created based on consensus of protein-centric and compound-centric approaches for prospective predictions.

1000+ indications with two or more compounds, with ~30 indications having 50+ approved compounds.
~200/1000+ indications where predictions identify related compound with the same indication in top 10 ranks (on average). Consistent sets of predictions are produced regardless of the sophistication of the comparison method, metric, or matrix used.
10-20/1000+ indications “work” by chance using fully randomised compound-proteome matrices with largely inconsistent sets of predictions.
While using full randomised matrices as controls and examining the corresponding results automatically accounts for normalisation issues (i.e., the method is more likely to work by chance on indications with 50+ approved compounds compared to those with only a few approved compounds), the vast majority of indications for which this method works have less than 10 approved compounds, and many of these include those with only two approved compounds.

CANDO || Protinfo || Bioverse || Samudrala Computational Biology Research Group || admin@compbio.org