Research database DaRUS: The 39 Dataverses of the University of Stuttgart

Researchers at the University of Stuttgart successfully use the central research database DaRUS. In February, the 1000th data set was published.

What do the radio noise measurements of the Flying Laptop, nanoscopic argon droplets and the role of density-driven dissolution of CO2 in karstification have in common? - Apart from the fact that many people only understand a few words with these: They are three titles of now more than 1000 data sets in the large research database DaRUS of the University of Stuttgart.

The memorable mark of 1000 published entries was cracked by the "Data Repository of the University of Stuttgart", the long name of the system DaRUS, on February 4, 2022. The interdisciplinary team of the Competence Center for Research Data Management (FoKUS) from the University Library (UB) and Technical Information and Communication Services (TIK) takes the milestone as an occasion to make their service known university-wide. DaRUS is available to all units of the University of Stuttgart and their research partners.

Data and metadata make a Dataverse

The operating system is the open source software DataVerse, which was developed at Harvard University. It allows to store data sets, to share them with each other and with others. Many interfaces are available to upload raw data.

To ensure that the data sets in DaRUS remain sorted and that radio noise measurements from space do not get mixed up with data from karstification, for example, the overall repository is divided into individual "Dataverses," which roughly mean data universes or simply containers. Each unit of the University of Stuttgart - for example, a department, an institute, a collaborative research center or cluster of excellence - receives a container. There are so far 39 "Dataverses" in which further sub-Dataverses and datasets are collected. A dataset is a collection of files, usually compilations of numbers, that have been combined into a package. Labeled with metadata, this package is then placed in the respective container.

First quality check, then publication

If researchers want to publish their dataset and make it accessible outside their Dataverse, the data goes through a validation process. Optionally, the subject administrators or the principal investigators can validate whether the data is correct. For instance, the SFB1313 follows this quality assurance. In all cases, the FoKUS team checks the datasets together with the publication service team of the UB with regard to formal criteria. These are partly general: Titles, short descriptions and keywords are required for all datasets. Further information on methods, instruments, and software tools used and a description of the dataset structure improve the comprehensibility, reproducibility, and usability of the data. Legal issues like copyright or data protection law also play a role in the review. For the engineering disciplines, representatives worked together with the UB team to develop their own metadata standards during the DIPL-ING project.

The two-step publishing process in DaRUS ensures that the data with its metadata can be found very easily.

When someone publishes in DaRUS for the first time, there are still several correction loops with suggestions for the metadata, as FoKUS manager Dr. Dorothea Iglezakis from the UB explains. But the more experience those responsible have, the faster things go: "Usually someone still has to do a lot the first time they publish data, but by the second and third time our users already know what to look for."

The researchers themselves see that the quality assurances are helpful despite their loops, as Dr. Elisabeth Rüthlein from SFB1333 emphasizes: "The workflows in DaRUS during publishing ensure that our data are easily verifiable and meet the FAIR criteria."

Data should be "FAIR"

Research results should be reproducible and traceable. This basis for scientific work was also the reason for the introduction of a data repository. The formula for good data is "FAIR". This stands for "findable, accessible, interoperable, reusable".

As the central research data platform of the University of Stuttgart with a storage guarantee of ten years for published data sets, DaRUS guarantees the first two criteria. To date, IT has provided 300 terabytes of storage space. The FoKUS team fulfills the other FAIR criteria with the scientists by setting metadata and checking for quality when publishing.

FAIR means that it's not enough to keep files with measurement series in the institute's library: "The pressure on researchers to do something with FAIR data is increasing. Many are happy to have a service that already exists and that they don't have to set up themselves," says Iglezakis. This is one of the main reasons why research groups have decided to publish in DaRUS.

The acronym FAIR stands for findable, accessible, interoperable and reusable.
The principles of FAIR Data are intended to make research data sustainably usable.

DaRUS gives data an afterlife

Prof. Bernd Flemisch from the IWS, who is jointly responsible for data and software management in SFB1313 and in the Cluster of Excellence SimTech, comments accordingly: "Publishing research today is more than just publishing scientific articles. Data and software must be integral parts of our research output. DaRUS enables our working group and all Stuttgart researchers to integrate data into our scientific output in a quality-assured way and to make it available to other researchers and the general public in a findable and reusable way within the framework of the FAIR principles."

Descriptions and keywords allow the published datasets to be easily searched and retrieved by interested parties on the DaRUS platform. A digital object identifier (DOI) also ensures that each dataset can be found again. For instant, this facilitates researchers to conveniently make the results of measurements and simulations available to a peer-review process for specialist journals. The published dataset remains retrievable with its DOI for the future.

Share and reuse data

The stock of data and metadata is growing continuously. Around 500 data sets have not yet been published. For these, the more than 600 users in their groups are still actively checking or adding to their data. Some also use the repository for sharing in groups and for their own data backup. In the social sciences, for example, there is a Dataverse that is used for internal exchange.

Among others, the "DataStewards" (as the administrators are called in DataVerse) from the Institute of Applied Mechanics (MIB), Prof. Holger Steeb and Matthias Ruf, focus on joint data evaluation: "We currently use DaRUS primarily for the publication of large, experimental research datasets in order to make them accessible to researchers who do not have the corresponding facilities and expertise." Metadata made "the datasets transparent and sustainable for use."

"Because of the citability of the shared research data, I can recommend the use of DaRUS to anyone who wants to increase the impact of their own research results," says Jonas Steigerwald of the Institute of Aerospace Thermodynamics (ITLR), who oversees DaRUS as data officer for the SFB-TRR75. DaRUS, on the other hand, is not intended for publishing literature or for saving individual work steps. "There are other systems for that, and it's no fun," Iglezakis assures.

Many advantages for small institutions

Screenshot from DaRUS: The Gyrolog project also collected photos of the surveyed objects.

The more than 1,000 data sets are unevenly distributed among the 39 Dataverses. The front-runner is the Gyrolog BMBF project, whose researchers have arranged, computer tomographically scanned, and photogrammetrically recorded more than 400 different objects relating to gyroscopic and inertial technology. Included in the project's funding approval was the condition that the raw data "be made publicly available," explains Prof. Jörg Wagner of the Chair of Flight Measurement Technology. "DaRUS provides an excellent platform for this, without having to create a separate infrastructure for Gyrolog."

Each Gyrolog object is now in a data set that is quite "FAIRly" available for interoperations and reuses. Wagner states: "The exchange of data with scientific working groups interested in subsequent use of the data has become much easier." The central repository also provides good services in terms of sustainability: "My rather small professorship cannot ensure the professional long-term care and maintenance of these data," says Wagner. DaRUS represents "a very good solution."

Other intensive users are SFB1333 with about 60, SFB1313 with 34, and MIB with 22 data sets. The MIB already uses an interface, primarily to feed in experimental data. Other Dataverses, such as that of SFB1333, IAG, and SimTech, are preparing their systems for automatic data transport. Once the metadata is in the right format, the institutes save themselves the effort of copying it by hand.

Productive pilot operation

Officially, the system, which started in October 2019 as part of a third-party funded project, is in "productive pilot operation". However, this is only pro forma; in practice, everything works excellently and the data is secured: Computers are located at two sites and store data such as metadata redundantly. Thanks to the conscientious work of the IT specialists, the system also performs very well in terms of downtime, for example due to the sudden uploading of enormous quantities of 13,000 files. Something like this might happen once every three months. But the team works fast: "Within 15 to 20 minutes, we have restarted DaRUS. Then the system is back," reports Iglezakis. "Only our fail-safe monitoring is still expandable." So the fact that it still says "pilot operation" is due to perfectionism.

Currently, there is still sufficient storage space for further data servers and data sets. To ensure that DaRUS remains capable of storing more than 300 terabytes, the University of Stuttgart, together with the University of Hohenheim, has submitted the "FairDataStorage" application to the Ministry of Science and the Arts (MWK) as part of the "State Major Instrumentation" program in order to expand the storage capacities in Stuttgart.

DaRUS for all

If you are a member of the University of Stuttgart and would like to deposit data in DaRUS yourself, you can register with the FoKUS team. After setting up a Dataverse and a short introduction, the service is available. It takes approximately 30 to 60 minutes to create a dataset. During the publishing process, a person from the FoKUS team spends about the same amount of time on quality assurance.

DaRUS

Contact

This image shows Ulrich Fries

Ulrich Fries

 

Science Manager

 

Hochschul­kommunikation

Keplerstraße 7, 70174 Stuttgart

To the top of the page