In catalytic sciences, as in all scientific fields, we face a rapidly increasing volume and complexity of research data, which are a challenge for analysis and reuse. A team led by Prof. Jürgen Pleiss from the Institute of Biochemistry and Technical Biochemistry at the University of Stuttgart has introduced EnzymeML as a data exchange format in journal recent article in "Nature Methods". EnzyemML serves as a format to comprehensively report the results of an enzymatic experiment and stores the data in a structured way and makes it traceable and reusable.
While more and more data is generated by an increasing number of researchers and increasing research expenditure worldwide, this data is hardly manageable by our scholarly practice of communicating scientific results. Even managing your own data manually is time-consuming and error-prone, but accessing and re-analyzing data from other research groups is almost impossible. The lack of standards, incomplete metadata, and missing original data make it nearly impossible to reproduce published results. More and more researchers feel like they are drowning in a tsunami of data.
This also applies to studies on the catalytic activity, selectivity and stability of enzymes and enzymatic networks, a field of research that is equally important for industrial biotechnology and biomedicine. What also complicates matters in this area is the fact that data describing enzymatic experiments is particularly complex, because an enzymatic reaction depends on many factors, such as the protein sequence of the enzyme, the recombinant host organism, the reaction conditions, and non-enzymatic secondary reactions. Furthermore, other effects such as inactivation or inhibition of the enzyme, or evaporation of the medium affect the results.
The new, standardized data exchange format "EnzymeML", presented by 23 authors from 14 different research institutions in the scientific journal "Nature Methods" gives hope in this respect. EnzymeML can completely record the results of an enzymatic experiment, from the reaction conditions to the measured data, as well as the kinetic model used to analyze experimental data and the estimated kinetic parameters. The format thus provides a seamless communication channel between experimental platforms, electronic lab notebooks, enzyme kinetics modeling tools, publication platforms, and enzymatic reaction databases. "We demonstrate the feasibility and usefulness of the EnzymeML toolbox using six scenarios where data and metadata from various enzymatic reactions is collected, analyzed, and uploaded to public databases for future use," explains first author Simone Lauterbach.
EnzymeML documents are structured and standardized, therefore the experimental results encoded in an EnzymeML document are interoperable and reusable by other groups. Because an EnzymeML document is machine-readable, it can be used in an automated workflow to store, visualize, and analyze data, as well as reanalyze previously published data, with no restrictions of the size of each data set, or the number of experiments.
"The digitalization of biocatalysis increases the efficiency of data management, visualization and analysis," emphasizes Prof. Jürgen Pleiss, corresponding author, and project coordinator. Furthermore, digitalization improves the reproducibility of experiments and data analyses, thus promoting trust in scientific results. "The EnzymeML toolbox makes best use of rapidly growing enzymatic data and is a useful tool that allows researchers to surf the research data wave."
EnzymeML is also used in research projects within the Collaborative Research Center "Molecular Heterogeneous Catalysts in Confined Geometries" (SBF 1333) and the Cluster of Excellence "Data-Integrated Simulation Science" (SimTech) at the University of Stuttgart, and is also incorporated into the German National Research Data Infrastructures NFDI4Cat and NFDI4Chem.
Prof. Dr. Jürgen Pleiss, University of Stuttgart, Institute of Biochemistry and Technical Biochemistry, Tel.: +49 711 685 63191, E-Mail
Simone Lauterbach, et al.: EnzymeML: seamless data flow and modeling of enzymatic data", Nature Methods 2023, DOI 10.1038/s41592-022-01763-1