OpenSMILES Specification

<< Previous: Introduction Up: Table of Contents Next: SMILES Input >>

2. Formal Grammar

2.1 Syntax versus Semantics

This SMILES specification is divided into two distinct parts: A syntactic specification specifies how the atoms, bonds, parentheses, digits and so forth are represented, and a semantic specification that describes how those symbols are interpreted as a sensible molecule. For example, the syntax specifies how ring-closure digits are written, but the semantics require that they come in pairs. Likewise, the syntax specifies how aromatic elements are written, but the semantics determines whether a particular ring system is actually aromatic.

For this specification, the syntax and semantics are explained separately; in practice, the syntax and semantics are usually mixed together in the code that implements a SMILES parser. This chapter is only concerned with syntax.

2.2 Grammar

Section Formal Grammar
  ATOMS
3.1 atom ::= bracket_atom | aliphatic_organic | aromatic_organic | '*'
  ORGANIC SUBSET ATOMS
3.1.5 aliphatic_organic ::= 'B' | 'C' | 'N' | 'O' | 'S' | 'P' | 'F' | 'Cl' | 'Br' | 'I'
3.5 aromatic_organic ::= 'b' | 'c' | 'n' | 'o' | 's' | 'p'
  BRACKET ATOMS
3.1.1 bracket_atom ::= '[' isotope? symbol chiral? hcount? charge? class? ']'
3.1.1 symbol := element_symbols | aromatic_symbols | '*'
3.1.4 isotope ::= NUMBER
3.1.1 element_symbols ::= 'H'| 'He' |'Li'|'Be'| 'B' |'C' |'N' |'O' |'F' |'Ne' |'Na'|'Mg'| 'Al'|'Si'|'P' |'S' |'Cl'|'Ar' |'K' |'Ca'|'Sc'|'Ti'|'V' |'Cr'|'Mn'|'Fe'|'Co'|'Ni'|'Cu'|'Zn'|'Ga'|'Ge'|'As'|'Se'|'Br'|'Kr' |'Rb'|'Sr'|'Y' |'Zr'|'Nb'|'Mo'|'Tc'|'Ru'|'Rh'|'Pd'|'Ag'|'Cd'|'In'|'Sn'|'Sb'|'Te'|'I' |'Xe' |'Cs'|'Ba'| 'Hf'|'Ta'|'W' |'Re'|'Os'|'Ir'|'Pt'|'Au'|'Hg'|'Tl'|'Pb'|'Bi'|'Po'|'At'|'Rn' |'Fr'|'Ra'| 'Rf'|'Db'|'Sg'|'Bh'|'Hs'|'Mt'|'Ds'|'Rg' |'La'|'Ce'|'Pr'|'Nd'|'Pm'|'Sm'|'Eu'|'Gd'|'Tb'|'Dy'|'Ho'|'Er'|'Tm'|'Yb'|'Lu' |'Ac'|'Th'|'Pa'|'U' |'Np'|'Pu'|'Am'|'Cm'|'Bk'|'Cf'|'Es'|'Fm'|'Md'|'No'|'Lr'
3.5 aromatic_symbols ::= 'c' | 'n' | 'o' | 'p' | 's' | 'se' | 'as'
  CHIRALITY
3.9 chiral ::= '@' | '@@' | '@TH1' | '@TH2' | '@AL1' | '@AL2' | '@SP1' | '@SP2' | '@SP3' | '@TB1' | '@TB2' | '@TB3' | ... | '@TB29' | '@TB30' | '@OH1' | '@OH2' | '@OH3' | ... | '@OH29' | '@OH30'
  HYDROGENS
3.1.2 hcount ::= 'H' | 'H' DIGIT
  CHARGE
3.1.3 charge ::= '-' | '-' DIGIT | '+' | '+' DIGIT | '--' *deprecated* | '++' *deprecated*
  ATOM CLASS
3.1.7 class ::= ':' NUMBER
  BONDS AND CHAINS
3.2, 3.9.3 bond ::= '-' | '=' | '#' | '$' | ':' | '/' | '\'
3.4 ringbond ::= bond? DIGIT | bond? '%' DIGIT DIGIT
3.3 branched_atom ::= atom ringbond* branch*
  branch ::= '(' chain ')' | '(' bond chain ')' | '(' dot chain ')'
  chain ::= branched_atom | chain branched_atom | chain bond branched_atom | chain dot branched_atom
3.7 dot ::= '.'
  SMILES STRINGS
3.10 smiles ::= chain terminator
  terminator ::= SPACE TAB | LINEFEED | CARRIAGE_RETURN | END_OF_STRING