About the mapping data
This page explains the data format for "Arita, M. Proceedings of the National Academy of Sciences USA, 101(6) 1543-1547, 2004". From Sep. 2008, we release a new version of metabolite information at http://metabolomics.jp/.
01. Compound Entry
Compound Definition File provides a connection between MDL MOLfiles (' ENTRY' field) and English names ('NAME' field). It also contains 'FORMULA' field to calculate the molecular mass.
The format is compatible with the LIGAND database.
ENTRY C00002 (filename of its MOLfile)
NAME ATP; Adenosine 5'-triphosphate (names separated by ';')
FORMULA C10H16N5O13P3 (molecular formula)
MASS 506.9957... (molecular mass)
///
All MOLfiles are manually curated, and checked for their correctness. In addition to removing redundancies and standardizing compound names, following curations are applied.
- Supply chirality information.
- Omit explicitly described hydrogens.
- Omit property lines to describe polymers. (The ARM softwares do not support polymeric structures.)
02. Enzyme Entry
Enzyme Definition File lists reaction formulas ('REACTION' field) for each EC entry ('ENTRY' field).
The format is compatible with the LIGAND database.
ENTRY EC 1.1.1.3
NAME L-homoserine:NAD(P) oxidoreductase
homoserine dehydrogenase
HSDH
REACTION NAD+ + L-Homoserine = NADH + L-Aspartate 4-semialdehyde;
NADP+ + L-Homoserine = NADPH + L-Aspartate 4-semialdehyde;
///
All reactions are manually curated referencing the Roche Biochemical Pathways Chart, KEGG database, BioCyc databases, and the IUBMB Nomenclature. In addition to removing redundancies and standardizing compound names, following curations are applied.
- Replace generic names with concrete ones (e.g. Alcohol with Ethanol).
- Rearrange reactant orders so that their structures correspond to (roughly) one to one on either hand sides.
03. Mapping Entry
Mapping Definition File is a machine-generated file for one-to-one atomic correspondents between metabolites. It consists of 3 parts.
Part 1. Mapping Information
<C00249:C02588:0> (unique label)
C02588 C00249 0 38 0 (mapping info)
0 6 ; 1 7 ; 2 8 ; 3 9 ; 4 10 ; 5 11 ; 6 12 ; 7 13 ;
8 14 ; 9 15 ; 12 5 ; 13 4 ; 14 3 ; 15 2 ; 16 1 ; 17 0 ;
<C00632:C03986:0> (next label)
C00632 C03986 0 12 0 (next mapping)
0 0 ; 1 1 ; 2 2 ; 3 3 ; 4 4 ; 5 5 ; 6 6 ; 9 9 7 ;
Each entry begins with a unique label indicating the substrate-product relationship. The next line after the label is mapping information. It describes substrate ID (C*****), product ID (C*****), a serial number of mappings between these reactants (integer), and the mapping size (integer) and direction (integer). Mapping direction is either 0 (reversible), 1 (left to right), or 2 (right to left). The second line is the actual mapping information, i.e. atomic position pairs between reactants separated by semicolons. Integers are the line numbers of corresponding MOL files. A pair of integers indicates that they are carbon atoms. A triplet of integers indicates that the last integer is the atomic number (7 = nitrogen, 16 = sulphur).
Part 2. Mapping Information
EC 2.7.1.11: ATP + D-Sedoheptulose 7-phosphate =
ADP + D-Sedoheptulose 1,7-bisphosphate
<C00002:C00008:0> <C00447:C05382:0> <C00002:C00447:0>
EC 3.5.4.9: 5,10-Methenyl-tetrahydro-folic acid + H2O =
10-Formyl-tetrahydro-folic acid
<C00234:C00445:0> <C00001:C00234:0>
After the first separator (///), reaction information follows. This section show which atomic mappings are contained in a reaction formula. In the above example, EC 2.7.1.11 reaction contains three mappings.
NOTE: The set of binary relationship in this Part2 MAY NOT represent pathways of metabolic network. Graph paths often do not conserve molecular moiety, and therefore do not correspond to any metabolic pathway. (Please read "Arita, M. Proceedings of the National Academy of Sciences USA, 101(6) 1543-1547, 2004" for more details.)
Part 3. Symmetry Information
C00007 2 0 1 ; 1 0 ;
C00094 4 1 3 ; 3 1 ;
The last section is the symmetry information. For each symmetric compound (C*****), the first integer is the mapping size, and then follows a pair of positions that are topologically equivalent. When more than two positions are equivalent, a partial list is registered in this file.
Part 4. MOL-format Files for compounds
Definition of the MOL-format is here (in Jap). The file is gzipped tar.