5.1 Influence of the Size List of Top-Ranked Compounds in the LBVS-Shape Method
As previously mentioned, the LBVS-Shape bases its predictions on a pre-selection of the first best compounds in terms of superimposition score (N). In this subsection, a study has been conducted to know how the value of N affects the final results from the point of view of electrostatic similarity. In particular, the LBVS-Shape has been performed on the selected 50 queries and for five different values of N, i.e. N has been set to 175, 438, 876, 1313 and 1751 compounds. It means that for each query, we have selected either $10\% $, $25\% $, $50\% $, $75\% $ or $100\% $ of the ranked compounds during the pre-selection phase.
Figure
3 illustrates a toy example of the main steps of the LBVS-Shape method for the
$\mathit{Query}$ DB01213 and
$N=1751$, i.e. the total number of compounds in the FDA set. Initially, the
$\mathit{Query}$ is compared to each compound
${\mathit{Target}_{S}}$ from the database to obtain their optimum position and corresponding shape similarity value
$T{c_{S}}$. As previously mentioned, this stage is carried out by using ROCS. Afterwards, compounds are sorted (
$R{k_{S}}$) in decreasing order by
$T{c_{S}}$. The
N best compounds are selected and evaluated to measure the corresponding electrostatic similarity value
$T{c_{E}^{\mathit{Eval}}}$. Notice that the evaluation of the electrostatic similarity considers the pose obtained with the shape similarity optimization. The compound with the highest
$T{c_{E}^{\mathit{Eval}}}$, called
BestComp throughout this paper, is selected as the best prediction. Finally, as an additional and unconsidered stage in the LBVS-Shape method, we have computed the optimized superposition between the
BestComp and the
$\mathit{Query}$ by using OptiPharm_ES. The corresponding
$T{c_{E}}$ value is then provided.
Fig. 3
Toy example of the performance of the LBVS-Shape method for a particular case where $\textit{Query}=DB01213$ and $N=1751$ using the FDA database.
To get an overview of the results, average values of the
BestComp found for the 50 queries and each value for
N have been computed, and shown in Table
1. In particular, the average position
$Av(R{k_{S}})$ in the sorted list where the
BestComp were located have been computed, together with the following: their mean number of atoms
$Av({N_{S}})$, their average shape similarity value
$Av(T{c_{S}})$, their corresponding electrostatic similarity value
$Av(T{c_{E}^{\mathit{Eval}}})$ when they are evaluated, and finally, their mean electrostatic similarity when they are optimized
$Av(T{c_{E}})$.
As it can be seen, the predictions seem to improve in term of electrostatic similarity as the number N of selected molecules in the sorted list increases (see columns $Av(T{c_{E}^{\mathit{Eval}}})$ and $Av(T{c_{E}})$). In accordance with these results, the posterior comparison between LBVS-Shape and LBVS-Electrostatic methods has been carried out by setting $N=1751$.
Table 1
Influence of the parameter N in the results obtained by the LBVS-Shape method. For each value of N, the following average values from the 50 queries, are shown: position in the shape ranking ($Av(R{k_{S}})$), number of atoms ($Av({N_{S}})$), shape similarity score ($Av(T{c_{S}})$), electrostatic similarity evaluation score ($Av(T{c_{E}^{\mathit{Eval}}})$) and electrostatic optimized similarity value ($T{c_{E}}$).
N |
$Av(R{k_{S}})$ |
$Av({N_{S}})$ |
$Av(T{c_{S}})$ |
$Av(T{c_{E}^{\mathit{Eval}}})$ |
$Av(T{c_{E}})$ |
175 |
73 |
53 |
0.627 |
0.451 |
0.559 |
438 |
162 |
50 |
0.587 |
0.486 |
0.568 |
876 |
287 |
51 |
0.564 |
0.495 |
0.569 |
1313 |
324 |
50 |
0.559 |
0.497 |
0.570 |
1751 |
362 |
49 |
0.554 |
0.497 |
0.569 |
5.2 Performance Comparison Between LBVS-Shape and LBVS-Electrostatic Methods
To analyse the performance of both methods, we have conducted a study in which the selected 50 molecular queries are processed with reference to the FDA database. Notice that comparing a query with itself always reaches the maximum similarity value, both for electrostatic potential as well as for shape. Subsequently, these results were removed when ranking the compounds. In other words, the compounds given as a result are not the most similar ones, but the second compounds in the ranked list. Additionally, as previously mentioned, the traditional method has been carried out considering the total number of compounds in the database $N=1751$, so as to increase the probability of finding better predictions.
To illustrate how we generate the later summarizing tables, a sample of the results obtained by both methods when comparing a query to the molecules in the dataset is studied. In particular, the instance
$\textit{Query}=DB01213$ is analysed. Notice that this is the example used to illustrate the stages of the LBVS-Shape method in Fig.
3. After that, the same instance is considered to exemplify the performance of the LBVS-Electrostatic method (see Fig.
4). Notice that this
$\textit{Query}$ has been selected because it is small and it helps to see the main ideas of the paper very easily by using figures. However, the conclusions inferred from the associated results can be extrapolated to any other
$\mathit{Query}$. As can be observed, the LBVS-Electrostatic technique solves an optimization problem to determine the electrostatic similarity,
$T{c_{E}}$, between the pharmaceutical
$\mathit{Query}$ and every
${\mathit{Target}_{E}}$ in the database. Afterwards, the list of compounds is sorted by the
$T{c_{E}}$ value and the one located in first position,
$R{k_{E}}=1$, is selected as the best prediction. Finally, to complete the study, optimization is carried out to calculate the shape similarity
$T{c_{S}}$ between the chosen compound and the
$\mathit{Query}$.
Fig. 4
An example of the performance of the LBVS-Electrostatic method for a particular case where $\mathit{Query}=DB01213$ is compared to the FDA database.
For the sake of clarity and comparison, the results shown in Figs.
3 and
4 are summarized in Table
2. The meaning of the columns as well as the particular values in the tables, are the ones previously explained and shown in each figure. The last row corresponds to the values associated with the best predictions. As can be observed, each method obtains a different compound as a top solution. LBVS-Shape provides the DB00184 molecule with a
$T{c_{S}}=0.621$ and a
$T{c_{E}^{\mathit{Eval}}}=0.500$. At the same time, LBVS-Electrostatic proposes the DB03255 compound as being the most similar to the query with
$T{c_{E}}=0.810$ and
$T{c_{S}^{\mathit{Eval}}}=0.880$. As such, the LBVS-Electrostatic method has not only obtained a more similar compound in terms of electrostatic potential, but also in shape. In Fig.
5, the final position for each case is shown.
Table 2
Summary of the results obtained for both LBVS-Shape and LBVS-Electrostatic methods for the query compound DB01213. The column notation, the colours included and the corresponding results come from Figs.
3 and
4, i.e. they maintain the same meaning as shown previously for those pictures. The last row indicates the results associated with the top solution selected for each method.
Fig. 5
Summary of results of LBVS-Shape and LBVS-Electrostatic where $\mathit{Query}=DB01213$. The Query compound is coloured green. Query electrostatic fields are coloured deep blue and red. Best compounds are shown in grey and their electrostatic potential fields, in light blue and pink.
Table 3
Rows are sorted by the number of atoms of queries. For each query, the same procedure explained in Table
2 is followed. The last row summarizes the average values for each column.
$\mathit{Query}$ |
${N_{Q}}$ |
LBVS-Shape |
LBVS-Electrostatic |
|
|
$R{k_{S}}$ |
${\mathit{Target}_{S}}$ |
${N_{S}}$ |
$T{c_{S}}$ |
$T{c_{E}^{\mathit{Eval}}}$ |
$T{c_{E}}$ |
${\mathit{Target}_{E}}$ |
${N_{E}}$ |
$T{c_{E}}$ |
$T{c_{S}^{\mathit{Eval}}}$ |
$T{c_{S}}$ |
DB00529 |
10 |
316 |
DB05266 |
35 |
0.496 |
0.437 |
0.593 |
DB00818 |
31 |
0.720 |
0.468 |
0.614 |
DB01213 |
12 |
182 |
DB00184 |
26 |
0.621 |
0.500 |
0.609 |
DB03255 |
13 |
0.810 |
0.880 |
0.963 |
DB00173 |
15 |
102 |
DB00851 |
23 |
0.792 |
0.546 |
0.536 |
DB01119 |
21 |
0.834 |
0.777 |
0.830 |
DB00172 |
17 |
24 |
DB00128 |
16 |
0.881 |
0.469 |
0.561 |
DB00677 |
25 |
0.699 |
0.690 |
0.769 |
DB00331 |
20 |
380 |
DB00961 |
40 |
0.598 |
0.599 |
0.697 |
DB01018 |
24 |
0.790 |
0.559 |
0.649 |
DB01119 |
21 |
513 |
DB00828 |
15 |
0.655 |
0.519 |
0.613 |
DB00173 |
15 |
0.832 |
0.779 |
0.829 |
DB02513 |
25 |
27 |
DB01275 |
20 |
0.872 |
0.526 |
0.569 |
DB06637 |
13 |
0.915 |
0.745 |
0.805 |
DB00915 |
28 |
125 |
DB00160 |
13 |
0.684 |
0.404 |
0.543 |
DB00478 |
34 |
0.946 |
0.673 |
0.924 |
DB01352 |
29 |
1 |
DB00306 |
32 |
0.926 |
0.947 |
0.983 |
DB00306 |
32 |
0.983 |
0.901 |
0.926 |
DB01365 |
30 |
180 |
DB01191 |
33 |
0.738 |
0.902 |
0.960 |
DB01626 |
26 |
0.964 |
0.628 |
0.824 |
DB00657 |
33 |
47 |
DB06770 |
16 |
0.788 |
0.396 |
0.517 |
DB01043 |
34 |
0.979 |
0.609 |
0.861 |
DB00478 |
34 |
30 |
DB00752 |
21 |
0.787 |
0.508 |
0.637 |
DB01043 |
34 |
0.957 |
0.615 |
0.879 |
DB01043 |
34 |
27 |
DB00945 |
21 |
0.765 |
0.400 |
0.478 |
DB00657 |
33 |
0.973 |
0.711 |
0.861 |
DB00380 |
35 |
601 |
DB00731 |
50 |
0.620 |
0.380 |
0.407 |
DB08971 |
56 |
0.505 |
0.435 |
0.655 |
DB00693 |
37 |
1034 |
DB04575 |
59 |
0.525 |
0.362 |
0.429 |
DB00692 |
40 |
0.454 |
0.391 |
0.783 |
DB09185 |
37 |
243 |
DB01233 |
43 |
0.722 |
0.839 |
0.506 |
DB09021 |
39 |
0.916 |
0.429 |
0.650 |
DB07615 |
40 |
71 |
DB04552 |
28 |
0.704 |
0.861 |
0.866 |
DB09218 |
28 |
0.892 |
0.610 |
0.574 |
DB09219 |
40 |
123 |
DB00321 |
44 |
0.698 |
0.347 |
0.329 |
DB00316 |
20 |
0.450 |
0.249 |
0.462 |
DB00674 |
42 |
279 |
DB00575 |
23 |
0.688 |
0.505 |
0.653 |
DB00514 |
45 |
0.662 |
0.415 |
0.695 |
DB00887 |
45 |
209 |
DB00232 |
31 |
0.642 |
0.401 |
0.454 |
DB01127 |
39 |
0.662 |
0.378 |
0.576 |
DB01198 |
45 |
273 |
DB00209 |
59 |
0.648 |
0.748 |
0.768 |
DB00123 |
25 |
0.894 |
0.334 |
0.491 |
DB01155 |
48 |
1 |
DB01165 |
46 |
0.858 |
0.671 |
0.818 |
DB01208 |
50 |
0.899 |
0.385 |
0.835 |
DB00246 |
50 |
467 |
DB00268 |
44 |
0.542 |
0.843 |
0.852 |
DB05271 |
48 |
0.877 |
0.391 |
0.604 |
DB00381 |
53 |
525 |
DB00573 |
32 |
0.577 |
0.285 |
0.278 |
DB00630 |
27 |
0.377 |
0.397 |
0.524 |
DB00876 |
54 |
576 |
DB01002 |
49 |
0.516 |
0.395 |
0.505 |
DB00774 |
28 |
0.532 |
0.276 |
0.524 |
DB09237 |
54 |
380 |
DB09092 |
44 |
0.580 |
0.759 |
0.824 |
DB08998 |
40 |
0.902 |
0.447 |
0.596 |
DB00254 |
55 |
1100 |
DB00271 |
28 |
0.521 |
0.626 |
0.836 |
DB00271 |
28 |
0.836 |
0.219 |
0.521 |
DB01268 |
57 |
902 |
DB09014 |
54 |
0.518 |
0.792 |
0.765 |
DB01409 |
48 |
0.883 |
0.421 |
0.564 |
DB01196 |
60 |
7 |
DB00783 |
44 |
0.741 |
0.397 |
0.385 |
DB08797 |
17 |
0.527 |
0.195 |
0.385 |
DB01621 |
66 |
274 |
DB00268 |
44 |
0.552 |
0.821 |
0.845 |
DB04861 |
55 |
0.867 |
0.330 |
0.454 |
DB09236 |
66 |
459 |
DB00607 |
51 |
0.509 |
0.406 |
0.438 |
DB00449 |
54 |
0.664 |
0.439 |
0.551 |
DB00632 |
69 |
537 |
DB00511 |
123 |
0.348 |
0.067 |
0.246 |
DB00898 |
9 |
0.997 |
0.126 |
0.137 |
DB08903 |
69 |
6 |
DB01433 |
58 |
0.621 |
0.840 |
0.867 |
DB01359 |
51 |
0.888 |
0.307 |
0.464 |
DB01419 |
70 |
380 |
DB09209 |
61 |
0.431 |
0.854 |
0.879 |
DB01611 |
51 |
0.933 |
0.291 |
0.423 |
DB00320 |
80 |
204 |
DB00438 |
59 |
0.515 |
0.367 |
0.396 |
DB00120 |
23 |
0.563 |
0.245 |
0.278 |
DB00728 |
91 |
1383 |
DB06204 |
40 |
0.399 |
0.688 |
0.761 |
DB09131 |
3 |
0.874 |
0.068 |
0.101 |
DB00503 |
98 |
655 |
DB00206 |
84 |
0.371 |
0.256 |
0.243 |
DB01144 |
22 |
0.401 |
0.180 |
0.280 |
DB01232 |
100 |
639 |
DB06480 |
52 |
0.389 |
0.691 |
0.741 |
DB09089 |
58 |
0.791 |
0.290 |
0.387 |
DB00309 |
110 |
385 |
DB01603 |
45 |
0.455 |
0.241 |
0.297 |
DB00319 |
63 |
0.467 |
0.267 |
0.534 |
DB04786 |
120 |
4 |
DB09158 |
82 |
0.377 |
0.424 |
0.708 |
DB09159 |
18 |
0.910 |
0.108 |
0.120 |
DB09114 |
130 |
117 |
DB00595 |
57 |
0.376 |
0.273 |
0.506 |
DB00583 |
26 |
0.876 |
0.183 |
0.190 |
DB06439 |
137 |
657 |
DB01628 |
39 |
0.383 |
0.336 |
0.425 |
DB00878 |
64 |
0.488 |
0.274 |
0.423 |
DB01078 |
140 |
34 |
DB00204 |
56 |
0.424 |
0.201 |
0.259 |
DB01085 |
31 |
0.540 |
0.169 |
0.211 |
DB01590 |
151 |
1037 |
DB01193 |
53 |
0.265 |
0.248 |
0.358 |
DB00653 |
6 |
0.529 |
0.070 |
0.100 |
DB04894 |
152 |
82 |
DB01199 |
87 |
0.361 |
0.348 |
0.484 |
DB09131 |
3 |
0.662 |
0.006 |
0.040 |
DB00403 |
167 |
325 |
DB04855 |
84 |
0.261 |
0.325 |
0.395 |
DB06335 |
49 |
0.575 |
0.120 |
0.198 |
DB00732 |
169 |
640 |
DB08967 |
52 |
0.222 |
0.236 |
0.353 |
DB00653 |
6 |
0.508 |
0.051 |
0.069 |
DB00050 |
194 |
7 |
DB01369 |
141 |
0.349 |
0.238 |
0.383 |
DB00516 |
19 |
0.385 |
0.059 |
0.080 |
DB06699 |
221 |
1465 |
DB01245 |
56 |
0.119 |
0.365 |
0.513 |
DB09131 |
3 |
0.642 |
0.013 |
0.029 |
DB06219 |
229 |
69 |
DB01369 |
141 |
0.293 |
0.277 |
0.394 |
DB09131 |
3 |
0.670 |
0.009 |
0.021 |
Mean |
74 |
362 |
– |
49 |
0.554 |
0.497 |
0.569 |
– |
31 |
0.738 |
0.372 |
0.505 |
Once the specific case of DB01213 has been explained in detail, the results of the 50 queries have been summarized in Table
3. Columns
$R{k_{E}^{\mathit{Eval}}}$ and
$R{k_{E}}$ have been removed in this table because their values are always 1. The last row summarizes the average of the results.
As evidenced, LBVS-Electrostatic obtains on average $T{c_{E}}=0.738$, which is higher than that given by LBVS-Shape, $T{c_{E}^{\mathit{Eval}}}=0.497$. Similar conclusions can be inferred when comparing the $T{c_{E}}$ average values for both methods. Additionally, when the results are analysed individually, we can see that LBVS-Electrostatic provides solutions with higher $T{c_{E}}$ values than those achieved by LBVS-Shape. In fact, in 48 out of 50 cases, LBVS-Electrostatic obtains a different compound than that reached by LBVS-Shape.
Regarding shape similarity, it is possible to infer that, on average, the methods are equivalent in terms of accuracy of the predictions, i.e. LBVS-Shape obtains an average value of $T{c_{s}}=0.554$ while LBVS-Electrostatic reaches a mean value of $T{c_{s}}=0.505$. Furthermore, analysing the obtained results individually, we can see that in 2 out of 50 cases, LBVS-Electrostatic offers better or equivalent predictions than that achieved by LBVS-Shape in terms of shape (see columns $T{c_{s}}$ in LBVS-Shape and $T{C_{s}^{\mathit{Eval}}}$ in LBVS-Electrostatic). It means that cases exist where two compounds can be very similar in terms of electrostatic potential, although they can be very different in terms of three-dimensional shape. It means that those solutions could not be obtained by using the methodology followed by the traditional LBVS-Shape method, since it only focuses on the compounds with the highest similarity in shape.
Making a somewhat more detailed approach for compounds smaller than 50 atoms, which means the first 23 query compounds in the table, there are 5 cases where the difference is less than 0.05 (DB00529, DB00173, DB00331, DB00915 and DB01352) and in another 3 cases the difference is 0.1 (DB01043, DB07615 and DB01268). Considering the values of these 7 cases in which the shape LBVS-Electrostatic is smaller than that of LBVS-Shape, the average difference is 0.048, while the mean gain in electrostatic similarity for those 7 compounds is 0.271. In large compounds, which includes 27 queries, there are only two cases with similar characteristics, which are compounds DB09236 with a difference of 0.07 and DB06699 with a difference of 0.013, both of them for shape similarity. In view of these results, the LBVS-Electrostatic method seems to be justified when proposing new solutions for small compounds.
However, not all the improvements are related to electrostatic fields. The optimization of electrostatic potential using OptiPharm_ES might allow a better solution to be found in terms of shape too. Compounds DB01119 and DB1213 in Table
3 are some outstanding examples. For example, in the case of
$\mathit{Query}=DB01119$, the best compound found by LBVS-Shape is DB00828 with
$T{c_{S}}=0.655$ and
$T{c_{E}^{\mathit{Eval}}}=0.519$. Moreover, LBVS-Electrostatic’s best compound is DB00173. It has a better
$T{c_{E}}$, i.e. 0.829, but also the position of those compounds after the electrostatic optimization is improved,
$T{c_{S}^{\mathit{Eval}}}=0.779$.