Introduction to SMILES
The whole name is almost as catchy as "SMILES". I used to think it was a strange way of representing molecules for computers. But it actually seems like a more straight forward way than IUPAC nomenclature. It's also shorter, and there's a possibility to have unique names. (It's more difficult to pronounce though.)
SMILES stand for "The Simplified Molecular Input Line Entry Specification" is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings. SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the molecules.The SMILES specification was developed by David Weininger in the late 1980s. It has since been modified and extended by others, most notably by Daylight Chemical Information Systems Inc. Other 'linear' notations include the Wiswesser Line Notation (WLN), ROSDAL and SLN (Tripos Inc).
The term SMILES refers to a line notation for encoding molecular structures and specific instances should strictly be called SMILES strings. However, the term SMILES is also commonly used to refer to both a single SMILES string and a number of SMILES strings; the exact meaning is usually apparent from the context.Typically, a number of equally valid SMILES can be written for a molecule. For example, CCO, OCC and C(O)C all specify the structure of ethanol. Algorithms have been developed to ensure the same SMILES is generated for a molecule regardless of the order of atoms in the structure.
There are two type of SMILES:
1. Canonical
2. Isomeric
Cononical SMILES refers to the version of the SMILES specification that includes rules for ensuring that each distinct chemical molecule has a single unique SMILES representation. This SMILES is unique for each structure, although dependent on the canonicalisation algorithm used to generate it.
Isomeric SMILES refers to the version of the SMILES specification that includes extensions to support the specification of isotopes, chirality, and configuration about double bonds. These are structural features that cannot be specified by connectivity alone and SMILES which encode this information.
In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a depth-first tree traversal of a chemical graph. The chemical graph is first trimmed to remove hydrogen atoms and cycles are broken to turn it into a spanning tree. Where cycles have been broken, numeric suffix labels are included to indicate the connected nodes. Parentheses are used to indicate points of branching on the tree. Aliphatic or nonaromatic carbon(C), atom in aromatic ring will be used lowercase letter Designate ring closure with pairs of matching digits.
Here are some structure images that I've draw using ACD/ChemSketch and also it's SMILES notation below the structure:
![]() |
Branched Strucrures |
![]() |
Cyclic Structures |
![]() |
Aromatic Structures |
![]() |
Branched and Aromatic Structures |
You can check for further example of smiles notation here.
You can try out SMILES strings at this page it's kind of fun. How to do it is described on wikipedia for example.
Ethane is just CC.
Add double and triple bonds like this:
C#CC=C for butenyne.
Add a branch in parentheses:
CC(C)CCC for 2-Methyl-n-pentane
If you want a ring add a number after the two atoms to be joined together:
C1C(C)CCC1 for Methyl-cyclo-pentane
Add a pyridyl group to the C next to the methyl group (aromatic atoms are written in lower case, and you have to include a second ring closure)
C1(c2ncccc2)C(C)CCC1 for (2-Pyridyl-)-2-methyl-c-pentane
You can add an extra oxirane ring:
C1(c2ncccc2)C(C)CCC13OC3
You can mess with stereochemistry (using @ and @@)
C1(c2ncccc2)[C@@H](C)CC[C@]13OC3
If you still haven't had enough, you can add a double bond in E configuration to the pyridyl ring:
C1(c2nc(/C=C(Cl)\C)ccc2)[C@@H](C)CC[C@]13OC3
or Z configuration
C1(c2nc(/C=C(Cl)/C)ccc2)[C@@H](C)CC[C@]13OC3
SMILES Bonds
*can be omitted
Ethane is just CC.
Add double and triple bonds like this:
C#CC=C for butenyne.
Add a branch in parentheses:
CC(C)CCC for 2-Methyl-n-pentane
If you want a ring add a number after the two atoms to be joined together:
C1C(C)CCC1 for Methyl-cyclo-pentane
Add a pyridyl group to the C next to the methyl group (aromatic atoms are written in lower case, and you have to include a second ring closure)
C1(c2ncccc2)C(C)CCC1 for (2-Pyridyl-)-2-methyl-c-pentane
You can add an extra oxirane ring:
C1(c2ncccc2)C(C)CCC13OC3
You can mess with stereochemistry (using @ and @@)
C1(c2ncccc2)[C@@H](C)CC[C@]13OC3
If you still haven't had enough, you can add a double bond in E configuration to the pyridyl ring:
C1(c2nc(/C=C(Cl)\C)ccc2)[C@@H](C)CC[C@]13OC3
or Z configuration
C1(c2nc(/C=C(Cl)/C)ccc2)[C@@H](C)CC[C@]13OC3
SMILES Bonds
Single* | - |
Double | = |
Triple | # |
Aromatic* | : |
For more references:
Go SMILES!
Wassalam..
Wassalam..
very smart..hmm
ReplyDelete