IBM Visualization
Data Explorer

Application Brief:

Chemistry


Thomas M. Jackman


IBM Research
Imaging Science & Technology
T.J. Watson Research Center
Yorktown Heights, NY 10598


PART 2.

Chemical Informatics [Covariance Matrix in 3D-QSAR]

Datamining, datawarehouses, chemical libraries, statistical inference, these are some of the important aspects of an emerging new area of computer-aided chemistry called Chemical Informatics. The ability to train computers to predict properties based upon knowledge of computational and experimental molecular properties offers the prospect of automatically screening massive libraries of chemical structures and associated databases of chemical properties to produce optimal molecular candidates for manufacture. Nevertheless, humans will continue to be the ultimate decision makers, and as such, will require informative visualizations to facilitate succinct communication of results from extensive datamining analyses. Visualization will necessarily become the principal language by which machine will communicate with humans involved in chemical informatics, as well as in the closely related field of bioinformatics.

The image above is a representation of the covariance matrix for 17 variables in a 3D pharmaceutical structure-activity analysis for a training set of compounds. The covariance between any two of the 17 variables (which include such properties as dipole, quadrupole and moments of inertia) is represented by a circular disk, with radius and color saturation indicative of magnitude. Positive values are coded with red hue, while negative values are color coded with blue hue. The main diagonal of the visual matrix provides a reference of size and color since each variable is maximally correlated with itself. This visualization is representative of the types of information-packed graphics which are required in chemical datamining.





Computational Chemistry [C60 LOD]

As raw compute power has advanced, so too has the reliability of computed and simulated results. The arsenal of computational chemistry includes ab initio, semiempirical and density functional quantum methods. Other methods include molecular mechanics, molecular dynamics, brownian dynamics, Monte Carlo, and simulated annealing, to name some of the more popular computational strategies.

Visualization can be used in conjunction with a calculation/simulation in several different ways. For example, using asynchronous outboard modules, output from a simulation can be passed to a running Data Explorer network via a TCP/IP socket so as to track the progress. The resulting sequence of images provides a way to monitor the calculation. In addition to passive tracking with DX, the computation can also be steered through the use of an application interface (API) library called DXLink. This library of routines, when incorporated into a user application, allows for two-way communication between Data Explorer and the external application through the use of control variables. In this way, the Data Explorer user interface (UI) can be used to both monitor and adjust the course of a calculation. Likewise, the user application can tailor the execution environment of the DX network. Many computations, such as ab initio quantum calculations, do not lend themselves to tracking and steering. Instead, it is easier in these cases to post-process the results. By coding a software filter, output, such as a grid of electron density data, can be converted to a form readable by Data Explorer. The filter can also be executed from within the Import module of a running DX network. Finally, using the complete suite of DX routines, it is possible to code one's own DX module and thereby launch an application from within Data Explorer.

Used in any such fashion, Data Explorer becomes a true scientific instrument by facilitating the visual exploration of scientific data. Not only is it generally possible to reproduce almost any graphic technique from some toolkit, but the computational chemist is free to develop his/her own visual methodologies and do actual data research. This afterall is the nature of scientific research -- the design and/or investigation of something new.

The tie-dyed buckyball (above) is the result of a quantum calculation in which the electron density, rho(r), data are passed in a DX network to the Grad followed by the DivCurl modules, so as to compute the Div·Grad{rho}, or the Laplacian of the density. An isosurface of the Laplacian, Laplacian{rho(r)}=0, provides the surface upon which the density itself is mapped. This is the surface which separates the region where charge is locally accreted from the region where charge is locally depleted. This visualization is relatively novel in the sense that few, if any, commercially available, software toolkits incorporate such a computational and visual capability.

[Comparison of ESP Techniques]

Not every exploratory visualization involves some new graphical representation, however. Often it is the content of the visualization which is novel and not the graphical technique. This is particularly true of correlative visualizations, as might be used to compare and contrast various properties or approximations, such as point charges. Point charges are an indispensible component of most force field calculations, such as in molecular mechanics and dynamics, and other comparative studies which require fast, simple representations of the electrostatics properties of molecular systems. The number of ways of obtaining charges to mimic the overall electrostatic behavior are legion, but the image shows that the various methods of point charge generation are not equivalent. This is evident by the differences in the electrostatic potential colors mapped onto the Van der Waal's surface. Moreover, while all the methods give nearly identical directions to the dipole moments, they give dramatically different magnitudes.





Polymer Chemistry [Polyimide Conformational Energy Surface]

Individual strands of a polymer are often modeled as mechanical entities which have rigid, often aromatic, segments interspersed with rotatable bonds of varying degree of torsional flexibility. Proximal substituents to the rotatable bonds are used to tailor their torsional accessibilites and thereby customize various viscoelastic properties of the polymeric material. Knowledge of the conformational energy surface of one or more rotatable bonds may give information about thermodynamic, rheologic and optical properties of the bulk polymer material.

The image shown is a conformational energy surface for the two central rotatable bonds in the fragment of the fluorinated polyimide shown above the surface [5]. The surface data were derived from an extensive force field calculation. Conventionally such surfaces are displayed as plots of contour lines (shown for contrast in the background of the image) and are called Ramachandran plots. The advantage of the 3-D representation is, however, distinctly evident.





Organic Chemistry

[Carbon Allotropes]

Researchers for computer manufacturers have paid disproportionate attention to the chemistry of silicon in preferance to its periodic neighbor, carbon. Yet the discovery of new allotropic forms of pure carbon, for which 1996 Nobel prizes have been awarded, give hope for new technological applications for pure carbon. The image above is an artistic composite of various allotropic forms of elemental carbon: buckminsterfullerene, graphite, diamond, and nanotubules of various lengths, helicites, and concentricities. The image illustrates the diversity of structural motifs available to pure carbon.

[Cyanamide Frontier Orbitals]

Organic chemistry, however, is usually considered to be the chemistry of carbon-based compounds. While much synthetic work can be accomplished with a thorough phenomenological understanding of carbon compounds and their reactivity, concepts derived from theoretical chemistry have increasingly pervaded the literature, particularly in mechanistic organic chemistry. The most prevelant concept borrowed from theory would have to be molecular orbitals, with frontier molecular orbitals generally the most relevant and useful with regards to molecular reactivity. Frontier orbitals are the Highest Occupied Molecular Orbital (HOMO) and the Lowest Unoccupied Molecular Orbital (LUMO). The frontier orbitals of cyanamide are shown for the low energy structure (left orbitals) and for a high energy structure (right orbitals), a transition state for a nitrogen inversion. The most obvious feature discernible from such a comparison of orbitals is that the relatively modest perturbation of the nuclear framework results in a rather profound redistribution of electrons.



Biochemistry [Active site: ACHE]

Though it is unquestionably the world's best chemist, it will never win a Nobel Prize. Though it has developed specialized catalysts to dramatically reduce activation energies for thousands of complex synthetic reactions, all under physical and chemical conditions which are extraordinarily mild, it is loathe to divulge its chemical secrets, guarding them with the tenacity of a corporate patent attorney. It is the living cell.

Because the many years of evolution have allowed cells to optimize their reaction mechanisms through that most discerning of methodolgies -- repeated trial and error -- it is worthwhile to understand and mimic biological chemistry. There are, however, many impediments to such investigations, not the least of which is the submicroscopic scale. Visualization becomes especially effective in such a context as it is the means by which the ultra-small can be extrapolated to the level of human comprehension.

[HOMO of Protonated Serine] [HOMO of Deprotonated Serine]

Consider the enzyme acetylcholinesterase. It is responsible for the hydrolysis of acetylcholine into acetate and choline after a neural synaptic transmission. It is an extremely efficient enzyme, as it must be for the the neural membrane to rapidly repolarize in preparation for the next neural signal. The active site of the enzyme (shown above) involves an activated serine residue. Semiempirical quantum calculations show that the nucleophilic oxygen is not activated unless it is deprotonated. This is evident by the HOMO, which is the relevant orbital for a nucleophile. Little of the orbital is spread over the hydroxyl oxygen of Ser-200 when it is protonated (left), but the oxygen gets significant population from the orbital when it is deprotonated(right). Such an ionization would be unlikely in aqueous solution, but the structure of the active site shows that neigboring residues assist and stabilize the proton's removal. The visualizations substantiate the well known mechanism of acetylcholinesterase as proceeding via a reversible activation step.





Biochemistry

Environmental chemistry is essentially a problem of fluid transport where the fluid might be atmosphere, freshwater, ocean, pollutant, gas deposit, or oil reservoir. Though they are non-trivial problems, computational methods are available to simulate models of transport. Visualization provides the means of tracking the course of computational fluid dynamics simulations and exploring the data visually.

[Scalar Field in Hydrodynamic Simulation] [Vorticity Field in Hydrodynamic Simulation]

The images shown (left and right) result from a hydrodynamic simulation of a pollutant using pseudo-spectral methods to solve the Navier-Stokes equation [6]. The left image represents a scalar field, such as temperature or concentration, while the right most image is the vorticity field at a corresponding point in time. The model is started from an initial set of conditions and evolves in time under periodic boundary conditions. Both images use an isosurface to track a specific scalar value during the course of the simulation and volume rendering to get a more encompassing depiction of the data. Both the rendering and the calculations were performed on multiple processors of IBM SP/1 and SP/2 supercomputers.





Analytical Chemistry

Analytical chemistry is now usually associated with instrumental methods in chemistry. In the past, visualization in analytical chemistry was limited to what could be produced by the chart pen-recorder. Instrumentation, however, has changed and so have the visual requirements. Two dimensional and three dimensional versions of infrared spectroscopy and nuclear magnetic resonance spectroscopy now exist which obviously have higher dimensional visual requirements. In fact, the use of three dimensional visualization facilitates the natural extension of dimensionality to many instruments.

[3D Lissajous Figure]

Consider the oscilloscope; a standard version allows up to two signal inputs. Computers enabled for analog-to-digital conversion have, in theory, no such restrictions. Since Data Explorer allows asynchronous input through the use of asynchronous modules, it is possible to envisage a trace for three signal inputs. If the three signal input are harmonic and correspond to three mutually perpendicular directions, the trace would appear as a three dimensional Lissajous figure, an illustration of which is shown here. The use of projections, faux shadows, onto the 2D Cartesian planes aid in the visual interpretation of the trace shown on the "instrument".



Chemical Education [Surface Zonal Harmonics]

Just as laboratory experiments allow students the hands on experience to understand how instruments work and synthesis is performed, numerical computer experiments offer students the opportunity to explore chemical data interactively and to appreciate theoretical and mathematical concepts. For instance, it is often hard to appreciate concepts such as special mathematical functions that are used in orthogonal expansions until one actualy manipulates them and visualizes the results. Using Data Explorer, educators can easily devise numerical protocols for students to perform for just such purposes. While this could be done with just paper and a calculator, a visual computer exploration of the subject allows for more elaborate execrcises. Moreover, basis sets which depend on two variables require a three dimensional, computer, visualization package such as Data Explorer.

[]


Spherical harmonics, for example, are an important but sometimes mystifying concept which appear in many areas of chemistry and physics. In order to appreciate their significance, it is valuable for students to visually see how functions such as these, (shown above as real combinations of spherical harmonics), can be added together in superposition to reproduce such things as the angular variation of the electrostatic potential of a molecule (left). With Data Explorer, such numerical student activities are readily devised.

[g orbital] [Isosurface of g orbital] Spherical harmonics also appear as the angular part of the solution of the Schroedinger equation for a particle in a central potential which, along with the radial part, give rise to hydrogenic orbitals. Hydrogenic atomic orbitals are such an important and useful concept in chemistry, they warrant the extensive discussions among educators as to what they are and how best to represent them. For example, volumetric depictions of orbitals, such as the g orbital, l=4,m=0, (left) go a long way toward reinforcing the notion that orbitals are not bounded objects, but rather unbounded distributions. Alternatively, colored isosurfaces (or contours of constant value) of an orbital, such as is used to depict the g orbital orbital (right), can be useful in showing positive and negative regions to an orbital when it is understood that these are representative surfaces and not bounding surfaces. Isosurfaces of orbitals often appear slightly different, a little less elongated, than customary hand drawings of orbitals because such drawing are really polar plots of one of the angular variables of the spherical harmonic and ignore the radial contribution. Isosurfaces are probably the more meaningful and the more faithful depictions of the "shape" of an orbital, but require more artistry from instructors or a 3D visualization package such as Data Explorer.






Conclusion

This brief has provided examples of the use of Data Explorer in chemical contexts. The examples chosen are by no means exhaustive of the types of visualizations that can be done nor of the chemical disciplines that exist. Rather the examples, which are decidedly biased toward manufacturing development, reflect areas relevant to a corporate research center. That there are other chemical disciplines which have been neglected or visual techniques which have been overlooked simply means that there are innumerable other ways in which 3D visualization can be a valuable adjunct to the activities of scientists, science educators and students involved in chemistry or one of its allied disciplines. To contemporize a statement made by August Kekulé, let us learn to visualize ladies and gentlemen so that we might learn the truth.






Back to Part 1


References

[1] D.B. Mitzi, C.A. Feild, W.T.A. Harrison, and A.M. Guloy, Nature, 369 (1994) 467
[2] K.P. Rodbell, J.L. Hurd, P.W. DeHaven, MRS Proc. (Fall 1995)
[3] P.Avouris and I.-W. Lyo, Science 264 (1994) 942
[4] B.D. Silverman and D.E. Platt, J. Med. Chem. (1996)
[5] G. Hougham and T.M. Jackman, Polymer Preprints 37(1996) 162
[6] N. Cao and S. Chen, to be submitted to Physical Review Letters.



Back to Part 1