CANDO preliminary predictions for Ebola

Last update of predictions: August 8, 2014.


Interpreting this data without understanding the context and how the platform works could be hazardous to your health (literally and figuratively)! These are a set of initial and preliminary predictions of human approved drugs and compounds likely to inhibit the Ebola virus. The predictions were made using the CANDO platform ( CANDO works well in expert hands, but these predictions are a first pass application across multiple Ebola proteomes and error rates are high. Refinement of the predictions based on considering the protein targets carefully is still in progress, and attempts are being made to validate the predictions at the bench.

The accuracy of the CANDO platform is between 12%-35% depending on the number of compounds evaluated both in terms of retrospective benchmarking and prospective in vitro validation (see publication below). Thus most predictions will be incorrect but in 10/11 studies to date covering 9 indications, we've obtained one or more hits that are comparable or better than an existing treatment (if available) or micromolar and nanomolar inhibitors of the pathogenic system (in this case, the Ebola virus) in vitro. The logic is that since these are compounds generally approved for human use, the predictions that are verified in vitro are good candidates for off-label use human studies to evaluate efficacy and ultimately, represent potential new therapies for a particular indication.


The manifest is available via the current directory listing. The current best set of predictions are files matching the expression refined_ebola_v1_top_hits* .

The files matching *sorted_uniq_cutoff*.txt are the unrefined predictions. The higher the interaction score cutoff, the better the quality of the predictions (but fewer in number). We normally use a cutoff of 1.1 for binary matrices. The first column is the number of protein interactions predicted and the second column is the corresponding compound name. So refined_ebola_v1_top_hits_sorted_uniq_cutoff_1.9.txt means enfurvirtide had predicted interactions to 7/64 Ebola protein structures with a cutoff of 1.9.

The files matching *_top_hits_sorted.txt contain the raw predictions and are just there for completeness. The individual error rate here is high but this gives information the predicted protein structure targets of particular drugs. The columns are: Uniprot ID where the interaction is predicted to occur (which could be anywhere on the protein), score, compound ID, compound column in the matrix, compound name.

The predictions are based on generating interactions between known drugs and Ebola proteins in a holistic manner: the feature being predicted is really the binding or "stickiness" of small molecules to Ebola proteins. This in turn is expected to turn up some inhibitors (of both the proteins and the pathogen) which in turn would lead to clinical efficacy. Some of the predictions are obviously incorrect or inappropriate, but the unedited output of the software is what is provided; it may be possible to refine the predictions further manually.


The prefixes of the output filenames and their associated meaning is given below.

ebola - proteins encoded by five? Ebola genomes compiled from
        Uniprot; aka ebola_v0.

1eboF - crystal structure of the ebola virus membrane fusion subunit,
        gp2, from the envelope glycoprotein ectodomain, in Weissenhorn
        W, Carfi A, Lee KH, Skehel JJ, Wiley DC. Mol Cell 2:605-?,

ebola_v1 - ebola (ebola_v0) + 1eboF. 

refined - refined versions of the above predictions created by
          eliminating redundacies. This is in progress as represent
          the current best set of preliminary predictions.


The citation for the paper that describes the CANDO platform in further detail is given below. The platform platform is based on a large number of methods developed by us and by others for protein structure, function, and interaction prediction. In this specific case, 64 proteins encoded by five Ebola genomes were modelled to obtain tertiary structures. Modelling was accomplished by inferring homology to structures in the PDB determined by x-ray diffraction. The structure with the PDB identifier 1ebo-F was also used. The CANDO pipeline was used to to obtain predicted interaction scores between these 65 structures and a library of 3733 human ingestible drugs and compounds. The compounds with the strongest predicted interaction scores and those that bound the most frequently to these 65 protein structures


The primary citation for the version of CANDO used to make these predictions is:

See also all our publications related to therapeutic discovery as well as a comprehensive list of all our publications.

Further reading

CANDO || Protinfo || Bioverse || Samudrala Computational Biology Research Group ||