Science  People  Locations  Timeline
Index: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Home > Simplified molecular input line entry specification


 Contents
The simplified molecular input line entry specification or SMILES is a specification for unambiguously describing the structure of chemical molecules using short ASCII alpha-numeric strings. SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the the molecules.

The SMILES specification was developed by David Weininger in the late 1980s. It has since been modified and extended by others, most notably by Daylight Chemical Information Systems Inc. Other 'linear' notations include the Wiswesser Line Notation (WLN), ROSDAL and SLN (Tripos Inc).

1 Graph based definition

In terms of a graph based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a depth-first tree-traversal of a chemical graph. The chemical graph is first trimmed to remove Hydrogen atoms and cycles are broken to make it into a spanning tree. Where cycles have been broken, numeric suffix labels are included to indicate the connected nodes. Brackets are used to indicate points of branching on the tree.

2 Examples

Atoms are represented by the standard abbreviation of the chemical elements, in square brackets, such as [Au] for gold. Hydroxide anion is [OH-]. If the brackets are omitted, the proper number of implicit hydrogen atoms is assumed; for instance the SMILES for water is simply O and that for ethanol is CCO. The double-bonded carbon dioxideCarbon dioxide is an atmospheric gas composed of one carbon and two oxygen atoms. One of the best known of chemical compounds, it is frequently called by its formula: :CO (pronunciation: "see oh two") Carbon dioxide results from the combustion of organic is represented as O=C=O and the triple-bonded hydrogen cyanideProperties General Name Hydrogen cyanide Chemical formula H CN Appearance Colourless liquid Physical Formula weight 27. 0 amu Melting point 260 K (-13 °C) Boiling point 299 K (26 °C) Density 0. 7 ×103 kg/ m3 Solubility very soluble Thermochemistry Δ as C#N. CyclohexaneCyclohexane is a molecule with the molecular formula CH consisting of six carbon atoms linked to each other to form a ring, with each carbon atom bearing two hydrogen atoms. Half of the 12 hydrogens are in axial position which means their C-H bonds are pa is represented as C1CCCCC1, the idea being that the two ones label the same position in the molecule, thus forming a ring with six carbons. Branches are described with parentheses, as in CCC(=O)O for propionic acidProperties General Name Propionic acid Chemical formula C H O Formula weight 74. 08 amu Synonyms propanoic acid ethanecarboxylic acid, methylacetic acid, ethylformic acid CAS number 79-09-4 Phase behavior Melting point 252 K (-21 °C) Boiling point 414 K ( and FC(F)F, or alternatively C(F)(F)F, for fluoroform .

3 Extensions

SMARTS is a modification of SMILES that allows, in addition to the SMILES elements, the specification of wildcard atoms and bonds. This is used in specifying search structures and is widely used in chemical databaseA Chemical database is a database specifically designed to store chemical information. Most chemical databases store information on stable molecules. Chemical structures are traditionally represented using lines indicating bonds between atoms and drawn on search applications. This practise has led to a common misconception that chemical substructure search is achieved computationally by matching SMILES/SMARTS strings, when in fact it is achieved by the computationally more intensive search for subgraph isomorphismIn mathematics, an isomorphism is a kind of interesting mapping between objects. Douglas Hofstadter provides an informal definition: :The word "isomorphism" applies when two complex structures can be mapped onto each other, in such a way that to each part in the graphs reconstructed from the SMILES representations.

Since SMILES is generated by tree-traversal, the string can vary depending on the root node chosen as well as the order in which nodes are encountered. A unique or 'canonical' form of the SMILES representation can be generated by applying rules to preprocess the tree before tree-traversal. A common application of unique SMILES is for exact matching of two structures and also for ensuring uniqueness among molecules in a database.

Important enhancements to SMILES include extensions to store information on stereochemistry.



Read more »

Non User