Personal tools
You are here: Home download Metabolic Information Atomic Mappings
Document Actions

About the mapping data

by member last modified 2008-08-30 01:04

This page explains the data format for "Arita, M. Proceedings of the National Academy of Sciences USA, 101(6) 1543-1547, 2004". From Sep. 2008, we release a new version of metabolite information at http://metabolomics.jp/.

01.  Compound Entry

Compound Definition File provides a connection between MDL MOLfiles (' ENTRY' field) and English names ('NAME' field). It also contains 'FORMULA' field to calculate the molecular mass.
The format is compatible with the LIGAND database.

 ENTRY    C00002                         (filename of its MOLfile)
NAME ATP; Adenosine 5'-triphosphate (names separated by ';')
FORMULA C10H16N5O13P3 (molecular formula)
MASS 506.9957... (molecular mass)
///

All MOLfiles are manually curated, and checked for their correctness. In addition to removing redundancies and standardizing compound names, following curations are applied.

  • Supply chirality information.
  • Omit explicitly described hydrogens.
  • Omit property lines to describe polymers. (The ARM softwares do not support polymeric structures.)

02.  Enzyme Entry

Enzyme Definition File lists reaction formulas ('REACTION' field) for each EC entry ('ENTRY' field).
The format is compatible with the LIGAND database.

 ENTRY       EC 1.1.1.3                         
NAME L-homoserine:NAD(P) oxidoreductase
homoserine dehydrogenase
HSDH
REACTION NAD+ + L-Homoserine = NADH + L-Aspartate 4-semialdehyde;
NADP+ + L-Homoserine = NADPH + L-Aspartate 4-semialdehyde;
///

All reactions are manually curated referencing the Roche Biochemical Pathways Chart, KEGG database, BioCyc databases, and the IUBMB Nomenclature. In addition to removing redundancies and standardizing compound names, following curations are applied.

  • Replace generic names with concrete ones (e.g. Alcohol with Ethanol).
  • Rearrange reactant orders so that their structures correspond to (roughly) one to one on either hand sides.

03.  Mapping Entry

Mapping Definition File is a machine-generated file for one-to-one atomic correspondents between metabolites. It consists of 3 parts.

Part 1. Mapping Information

 <C00249:C02588:0>                            (unique label) 
C02588 C00249 0 38 0 (mapping info)
0 6 ; 1 7 ; 2 8 ; 3 9 ; 4 10 ; 5 11 ; 6 12 ; 7 13 ;
8 14 ; 9 15 ; 12 5 ; 13 4 ; 14 3 ; 15 2 ; 16 1 ; 17 0 ;
<C00632:C03986:0> (next label)
C00632 C03986 0 12 0 (next mapping)
0 0 ; 1 1 ; 2 2 ; 3 3 ; 4 4 ; 5 5 ; 6 6 ; 9 9 7 ;

Each entry begins with a unique label indicating the substrate-product relationship. The next line after the label is mapping information. It describes substrate ID (C*****), product ID (C*****), a serial number of mappings between these reactants (integer), and the mapping size (integer) and direction (integer). Mapping direction is either 0 (reversible), 1 (left to right), or 2 (right to left). The second line is the actual mapping information, i.e. atomic position pairs between reactants separated by semicolons. Integers are the line numbers of corresponding MOL files. A pair of integers indicates that they are carbon atoms. A triplet of integers indicates that the last integer is the atomic number (7 = nitrogen, 16 = sulphur).

Part 2. Mapping Information

 EC 2.7.1.11: ATP + D-Sedoheptulose 7-phosphate = 
ADP + D-Sedoheptulose 1,7-bisphosphate
<C00002:C00008:0> <C00447:C05382:0> <C00002:C00447:0>
EC 3.5.4.9: 5,10-Methenyl-tetrahydro-folic acid + H2O =
10-Formyl-tetrahydro-folic acid
<C00234:C00445:0> <C00001:C00234:0>

After the first separator (///), reaction information follows. This section show which atomic mappings are contained in a reaction formula. In the above example, EC 2.7.1.11 reaction contains three mappings.

NOTE: The set of binary relationship in this Part2 MAY NOT represent  pathways of metabolic network.  Graph paths often do not conserve molecular moiety, and therefore do not correspond to any metabolic pathway. (Please read "Arita, M. Proceedings of the National Academy of Sciences USA, 101(6) 1543-1547, 2004" for more details.)

Part 3. Symmetry Information

 C00007 2 0 1 ; 1 0 ;
C00094 4 1 3 ; 3 1 ;

The last section is the symmetry information. For each symmetric compound (C*****), the first integer is the mapping size, and then follows a pair of positions that are topologically equivalent. When more than two positions are equivalent, a partial list is registered in this file.

Part 4. MOL-format Files for compounds

Definition of the MOL-format is here (in Jap).  The file is gzipped tar.


Powered by Plone, the Open Source Content Management System

This site conforms to the following standards: