Rational Drug Design	Proteomics and Drug Discovery	Docking and Scoring
	Virtual Libraries

This Section:

Virtual Compound Libraries

One of the first steps in in-silico drug design is the determination of the structure of a protein to be targeted, which facilitates the discovery of sites on a given protein that could serve as potential drug targets. These sites can either be active sites, allosteric regulation sites, regions of subunit attachment, or areas known to function in protein-protein or protein-small-molecule interactions.

While protein structures and drug target sites are being determined, the other half of the in silico drug discovery process is carried out. In this phase, chemists create or add to databases of potential drug leads; these virtual drug libraries will then be used for virtual library screening, or VLS. This is the process of docking and scoring potential drugs into predicted target sites.

Construction of Virtual Compound Libraries

In traditional drug discovery, a practice known as target-oriented synthesis (TOS) is used. This approach, brought about by the cost of synthesizing new compounds, focuses on modifying existing chemicals to find a variant that will match a target site. Also due to the cost of synthesizing novel compounds, traditional drug leads have been synthesized based on ease of synthesis, rather than pharmacological value. However, the advent of recent leaps in informational technology advancements have facilitated the construction of virtual compound libraries. By acting as huge repositories of molecular data and their respective characteristics, these libraries allow us to actually carry out a molecular interaction experiment without the actual and sometimes significant expense of wetlab materials. This in silico approach obviously expedites the process as well, ideally geared towards a streamlined process. As virtual drug libraries and virtual library screening are becoming more widespread, there is a shift to diversity-oriented synthesis. It is much cheaper to create many new classes of drugs in silico, as only the ones that dock and score adequately will need to be synthesized. The trade off in terms of cost and time between traditional and contemporary methods cannot be understated.

Two main categories of drug libraries exist. The first type consists of a general and diverse range of chemical compounds. These types of libraries are used for discovering new types of drug leads for specific applications. Having a diverse range of novel compounds allows new drug leads to be found; in some cases, these leads can end up being more effective inhibitors than ones that are currently used in marketed drugs (Anderson). The other type of library, rather than being “broad” and “shallow,” is “narrow” and “deep.” These libraries are composed of a much narrower range of compounds; however, many different conformation and variations of this narrow range are included. Stereochemistry is one of the major variables used in creating variations of existing structures to add to such databases. For example, a compound that might not otherwise dock effectively to a protein, might have a tautomer that docks very well. The vast majority of the compounds in these libraries are small organic molecules, which have always been the premier substances in traditional drug discovery (Anderson). The reasons for the widespread use of small organic compounds in drug discovery, both in silico and in vitro/in vivo are: such compounds are much easier to fit into protein drug target sites during VLS; they are much easier to synthesize after successful virtual screening, assuming they do not currently exist; they typically have better ADME (absorption, distribution, metabolism, excretion) profiles than many larger, more complex compounds. It is hypothesized that the total drug space exceeds 10^100 compounds, and that figure does not include stereochemical variants (Anderson). So the use of small organic compounds in virtual libraries provides a subset of the total chemical space, albeit a subset that is still almost too large to be completely screened for every targeted protein.

As these data bases grow exponentially in size, they can be expected to become rate limiting factors due to modern computational limitations. For this reason, new approaches are being developed aimed at making the search process itself more effective and efficient. One if these tools, known as shape signatures, can be used in predicting molecular shape similarity or complementarity to a specific ligand or receptor, by analyzing the volume of the ligand and the target active site volume and relating it to such variables as electrostatic factors. A series of probability distributions are generated, from which geometric principles are applied to provide comparison between drug and target (Meek).

Virtual Library Screening

The computational approaches behind the virual screening associated with these Virtual Compound Libraries can be ligand-based, such as is seen with typical pharmacophore queries, or structure based, based on structural similarities between potential ligands, as is exhibited within drug target docking analyses (Lengauer). Also, quantitative relationships pertaining to structure and activity can also be utilized in developing virtual screening algorithms (Kapetanovic).

Various computationally intelligent algorithmic methods are employed to add new compounds to virtual libraries, or to generate hit candidates which can be subsequently tested towards hopefully becoming a drug lead candidate. These methods may simply randomly assemble atomic groups into larger structures, or they may adapt existing structures. Often, existing chemicals have parameters such as side groups, chemical modifications, and stereochemistry modified, creating new compounds in the process. Also, incremental construction, pertaining to ligand rigidity and partial structure, as opposed to complete structure, can be employed to optimize these algorithms in computing the potential molecular interactions taking place. In general, the more molecular flexibility is permitted, the greater number of pharmacophores are generated, since a greater amount of molecular interaction is permitted.

Due to the large volume of chemical space contained in even the narrowest of virtual compound libraries, multidimensional methods of applying compounds to targets have started to take hold. These methods go through a series of 'weeding' techniques based on factors of ever-increasing stringency. The first step of such a process would apply known limiting factors to the database as a whole. These factors may include unmanageable molecular size, structures known to induce toxicity, cytochrome P450 activation, or factors known to limit solubility. Once these initial limitations removed the first set of unsuitable compounds, the second level of screening would be applied. The second dimension utilizes 'molecular topology', which in simplest form is merely a method to assign functional and behavioral characteristics to a structural element. This could include solubility, polarity, and ability to form hydrogen bonds. The selectivity in this step would be based on either comparison with know in vivo ligands or the functionality of the known target site. The third dimension uses, appropriately enough, three dimensional models of the molecules are used to determine possibility of binding to the rough model of a target site. In the final, fourth dimension of the screening method, docking and scoring of the molecules left in the library is begun. This is based on all the factors in the previous dimensions but on a much more sophisticated scale. The structure of the target site is compared to a potential hit and all possible interactions are manifested mathematically to generate the final list of top scoring products. This method of virtual screening allows for less analysis time as the least fit molecules do not undergo rigorous screening. Instead, their obvious deficiencies for the drug target are used to eliminate them from time consuming scoring methods (Bleicher).

Pharmacophores, which are sets or patterns of chemical groups known to interreact in identical fashion around the same receptor site, are often conserved in deriving new compounds from existing ones, aiding in future predictions of drug to target interactions (Lengauer). Furthermore, compounds that belong to classes of known drug leads, designating the fact that they showed positive signals in in-silico and in-vivo tests, may be put into focused libraries (Bleicher). These databases modify compounds slightly to product many similar, yet slightly different, structures in an effort to expand classes of known pharmacological value. The sheer amount of potential molecular interactions possible, given all possible variables in the biological world, necessitates the use of modern computation.

References

Bleicher, H. Konrad, et al. Hit and Lead Generation: Beyond High-Throughput Screening. Nature Reviews, Drug Discovery. 2003 May:(2), 369-378.

Kapetanovic, M. I. Computer-aided drug discovery and development (CADDD): In-silico-chemico-biological approach. Chemico-Biology Int. 2007.

Lengauer, Thomas, et al. Novel technologies virtual screening. Drug Discovery Today. 2004 Jan:(9)1, 27-34.

Meek, J. Peter, et al. Shape Signatures; speeding up computer aided drug discovery. Drug Discovery Today. 2006 Oct:(11)19-20, 895-904.

Next: Docking and Scoring