OpenSMILES Specification

<< Previous: SMILES Output Up: Table of Contents Next: SMILES Extensions >>

5. Nonstandard Forms of SMILES

Several SMILES-generating systems are in use that either generate incorrect SMILES, or that interpreted some of the ambiguous features of the original SMILES specification in different ways. Although these SMILES are illegal according to this formal OpenSMILES specification, it is often useful to parse them, in order to make use of the information that accompanies these SMILES.

These "relaxed" SMILES rules should only be allowed when the user (presumably after thinking about the consequences) requests it. A SMILES parser that allows any or all of these "relaxed" rules must not do it by default. The user must specifically request these relaxed rules before a parser can accept such SMILES.

The following table lists "relaxed" rules that SMILES parsers may accept.

Rule Example Interpred as... Details
Extra parentheses C((C))O C(C)O Extra parentheses are ignored in places where there is no ambiguity as to the meaning. Note that the form "(CO)N" is never allowed, since it isn't clear which atom the nitrogen should connect to.
C((C))O C(C)O
(N1CCCC1) N1CCCCC1
Misplaced dots [Na+]..[Cl-] [Na+].[Cl-] Two or more dot-bonds in a row are condensed into one. A leading or trailing dot-bond is ignored. Note that a dot that starts a branch is legal in strict SMILES; for example, C1CC(.[Na+])CC1[O-] is a legal (though strange) SMILES.
.CCO CCO
CCO. CCO
Mismatched Ring Bonds C1CCC CCCC Mismatched ring bonds are ignored. Note that this is almost always a bad idea. For example, "C1CCCCC2" is almost certainly supposed to be cyclohexane "C1CCCCC1", but with "relaxed" parsing would be interpreted as hexane.
Invalid Cis/Trans specification C/C=C CC=C Mismatched or incomplete cis/trans bonds are ignored.
C/C=CC CC=CC
CC/=C/C CC=CC
Conflicting cis/trans specification C/C(\F)=C/C CC(F)=CC Conflicting cis/trans bonds are ignored. (In this case, both the methyl and fluorine on the left are shown as trans to the methyl on the right, an impossible configuration.)
D and T D[CH3] [2H][CH3] The symbols "D" and "T" are treated as synonyms for [2H] and [3H].
T[CH3] [3H][CH3]
Lowercase as sp2 CccccC
CC=CC=CC 2,4-hexadiene Lowercase letters are interpreted as sp2, even outside of ring systems.
Ccc CC=C propene