From e495db514c8e772e4a2b9d02cf5526f175392d58 Mon Sep 17 00:00:00 2001 From: Giacomo Lanza <37865804+Zack-83@users.noreply.github.com> Date: Mon, 1 Jun 2026 14:13:32 +0200 Subject: [PATCH 1/8] Update 30_data_organisation.mdx --- docs/30_data/30_data_organisation.mdx | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/30_data/30_data_organisation.mdx b/docs/30_data/30_data_organisation.mdx index 9aa22c08..1efcff40 100644 --- a/docs/30_data/30_data_organisation.mdx +++ b/docs/30_data/30_data_organisation.mdx @@ -60,10 +60,11 @@ If you need to rename a multiple files, take a look at: - [Thunar Bulk Rename](https://docs.xfce.org/xfce/thunar/bulk-renamer/start) (Linux, GUI) - [command line: mv, mmv, rename](https://linuxconfig.org/how-to-rename-multiple-files-on-linux) (Linux, CLI) - [Bulk Rename Utility](https://www.bulkrenameutility.co.uk/) (Windows, free) +- [A.F.5 Rename your files](http://fauland.com/download.htm) (Windows, free) - [TotalCommander](https://www.ghisler.com/advanced.htm#tutorial_rename) (windows, Shareware) - [Renamer4Mac](https://renamer.com/) (Mac). -For some special file formats there are tools for adapting the file name to metadata. For example, to create a file name that fits your scheme and takes date and time information from the EXIF data of a jpg file. Some also allow adding an offset - this helps sort photos into timestamps that run on different clocks. +For some special file formats there are tools for adapting the file name to the metadata. For example, to create a file name that fits your scheme and takes date and time information from the EXIF data of a jpg file. Some also allow adding an offset - this helps sort photos into timestamps that run on different clocks. ## Files: versioning @@ -72,7 +73,7 @@ Stash snapshots or simply track changes and allow to find something that existed #### Possible Solutions - Low number of requirements: manage manually e.g. by keeping a log where the changes for each respective file is documented, version by version. -- For automatic management of versioning, conflict resolution and back-tracing capabilities, use a proper version control software such as [Git](https://git-scm.com/), hosted by e.g. [GitHub](https://github.com/) or your home institution. Very strong with uncompressed, readable and comparable files such as text files or csv. +- For automatic management of versioning, conflict resolution and back-tracing capabilities, use a proper version control software such as [Git](https://git-scm.com/), hosted either remotely on a portal such as Codeberg, Forgejo, SourceHut, or at your home institution (most usually GitLab). Very strong with uncompressed, readable and comparable files such as text files or csv. - Use a Cloud Storage service (see [Data Storage and Archiving](/docs/data_storage)) that provides automatic file versioning. Very strong on spreadsheets, text files and slides. ## Files: types of metadata @@ -84,11 +85,11 @@ Consider the way data and [Metadata](/docs/metadata/) can be stored together as - technical metadata - structural metadata -An FDO encapsulates data and metadata in one file and can be saved as an [HDF5](https://www.hdfgroup.org/solutions/hdf5/), for example. See [Data Format Standard](/docs/format_standards/) for more information. +An FDO encapsulates data and metadata in one file and can be saved as an [HDF5](https://www.hdfgroup.org/solutions/hdf5/), for example. See [Data Format Standards](/docs/format_standards/) for more information. ## Files: formats -Different disciplines use established standards, see [Data Format Standard](/docs/format_standards/). Also consider beyond the duration of the project: +Different disciplines use established standards, see [Data Format Standards](/docs/format_standards/). Also consider beyond the duration of the project: - usage of proprietary or open file formats - exchange within and outside of the working group From 3e9f0425e7b71fc446891b74d5999721b1fec34d Mon Sep 17 00:00:00 2001 From: Giacomo Lanza <37865804+Zack-83@users.noreply.github.com> Date: Mon, 1 Jun 2026 14:19:04 +0200 Subject: [PATCH 2/8] Update 00_format_standards.mdx --- docs/60_topics/62_data_formats/00_format_standards.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/60_topics/62_data_formats/00_format_standards.mdx b/docs/60_topics/62_data_formats/00_format_standards.mdx index f001c0ed..c2438467 100644 --- a/docs/60_topics/62_data_formats/00_format_standards.mdx +++ b/docs/60_topics/62_data_formats/00_format_standards.mdx @@ -4,7 +4,7 @@ slug: "/format_standards" id: "format_standards" --- -# Data format standard {#format_standard} +# Data format standards {#format_standard} Research data is the key to any science. As part of a [FAIR](/docs/fair) practice, these data should be accompanied by valuable [metadata](/docs/metadata) and have to be exchanged in standardised and open formats. In chemistry these data include experimental parameters and measurement results, chemical structures, properties of compounds and descriptions of reactions. Whenever research data are published, stored, shared or reused, chemists have to decide on a format suitable for the purpose and consider the long-term stages of the [data life cycle](/docs/data_life_cycle). [Repositories](/docs/repositories) and databases often accept specific formats to ensure comparability and completeness of data and metadata provided. If chemists are aware of data standards during data acquisition and research documentation, later conversions will be less challenging. [Electronic lab notebooks](/docs/eln) can support the scientist in the early stages of data management. From e643d13779d8b5b9be8cd8fab5b66a21c47a1617 Mon Sep 17 00:00:00 2001 From: Giacomo Lanza <37865804+Zack-83@users.noreply.github.com> Date: Mon, 1 Jun 2026 14:22:14 +0200 Subject: [PATCH 3/8] Update 30_data_organisation.mdx --- docs/30_data/30_data_organisation.mdx | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/30_data/30_data_organisation.mdx b/docs/30_data/30_data_organisation.mdx index 1efcff40..48d61bb0 100644 --- a/docs/30_data/30_data_organisation.mdx +++ b/docs/30_data/30_data_organisation.mdx @@ -112,6 +112,7 @@ The top folder should have a README.txt file describing the folder structure and #### An example by [RDMKit](https://rdmkit.elixir-europe.org/data_organisation.html#what-is-the-best-way-to-name-a-file): +``` project/ code/ code needed to go from input files to final results data/ raw and primary data (never edit!) @@ -128,6 +129,7 @@ The top folder should have a README.txt file describing the folder structure and tables/ scratch/ temporary files that can safely be deleted or lost README.txt file and folder description +``` ## Sources and further information From a00dcb8787de7821b201551a8a1fb931b356ff48 Mon Sep 17 00:00:00 2001 From: Giacomo Lanza <37865804+Zack-83@users.noreply.github.com> Date: Mon, 1 Jun 2026 14:26:38 +0200 Subject: [PATCH 4/8] Update 30_data_organisation.mdx --- docs/30_data/30_data_organisation.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/30_data/30_data_organisation.mdx b/docs/30_data/30_data_organisation.mdx index 48d61bb0..09ed4195 100644 --- a/docs/30_data/30_data_organisation.mdx +++ b/docs/30_data/30_data_organisation.mdx @@ -29,7 +29,7 @@ Find a balanced set of elements: Too many make it difficult to grasp quickly, wh - Order the elements from general to specific. - Use meaningful abbreviations instead of long identifiers. -- Use underscore `_`, hyphen `-` or capitalized letters to separate elements in the name. Don’t use spaces or special characters: `?!&,_%#;_()@$^~‘{}[]<>`. +- Use underscore `_`, hyphen `-` or capitalized letters to separate elements in the name. Don’t use spaces or special characters: `?!&,%#;()@$^~‘{}[]<>`. - Use date format ISO8601: `YYYYMMDD`, and time if needed `HHMMSS`. - Include a version number if appropriate: minimum two digits (V02) and extend it, if needed for minor corrections (V02-03). The leading zeros, will ensure the files are sorted correctly. From 5d9d1466e55d1ae74d94930dab543113266de7ce Mon Sep 17 00:00:00 2001 From: Giacomo Lanza Date: Fri, 19 Jun 2026 10:02:52 +0200 Subject: [PATCH 5/8] language improvements --- docs/30_data/40_data_documentation.mdx | 30 ++++++++-------- docs/30_data/50_data_storage.mdx | 26 +++++++------- .../00_data_publishing.mdx | 14 ++++---- docs/50_data_publication/10_repositories.mdx | 16 ++++----- .../10_metadata.mdx | 36 +++++++++---------- 5 files changed, 59 insertions(+), 63 deletions(-) diff --git a/docs/30_data/40_data_documentation.mdx b/docs/30_data/40_data_documentation.mdx index 389d745d..565d572b 100644 --- a/docs/30_data/40_data_documentation.mdx +++ b/docs/30_data/40_data_documentation.mdx @@ -25,31 +25,31 @@ _Andreas von der Dunk, Technische Universität Dresden, Service Center Research A clean and comprehensible organisation of data and documents are an important part of good research practice and an important step to realise research data management according to the [FAIR data principles](/docs/fair). -An essential task is to plan the organisation and storage of data, [metadata](/docs/metadata), and documents in advance and to document the relevant measures. +An essential task is defining in advance the organisation and storage of data, documents, and their [metadata](/docs/metadata), and to document the relevant measures. Central requirements are the definition of [formal responsibilities, organisational conventions](#formal-responsibilities-and-organisational-conventions) and [technical implementations](#technical-implementations) to organise the data and meta information produced. The information collected for this purpose is recorded in the [Data Management Plan](/docs/dmp). Note that a good data organisation also estimates the costs (see [costing tool and checklist](https://ukdataservice.ac.uk/learning-hub/research-data-management/plan-to-share/costing/) for example) in the early application phase. ### Formal responsibilities and organisational conventions -To make a long story short: It's mainly about describing who will and how to work with documents. First steps could be about: +An essential part of the data documentation is the definition of the actors working with the data and the procedures. These are possible first steps: - Documenting the responsibilities of primary researchers and project staff. -- Creating user roles: Define detailed rights for users / groups / roles to access data and sensitive information. +- Creating user roles: Defining detailed rights for users / groups / roles to access data and sensitive information. - Describing processes of quality assurance including protected storage, sharing, and accessibility in the short term and on the long run. -- Data processing: How, where, how fast. Describe input and output data. Decide how you will name and structure files and folders. +- Data processing: How, where, how fast. Description of input and output data; decisions on naming and structuring conventions for files and folders. -The result should be descriptive documents that unambiguously define for files which are used in the course of the daily work routines: +The result should be a set of descriptive documents associated to the files used during the daily work routines, which unambiguously determine: -- on which status (for example original file, temporary work file; Draft, intermediate version, final version), -- where (workstation PC, central file server, database), -- for how long (temporarily, project duration, long-term availability), -- and in which format they are saved. +- the status (for example original file, temporary work file; Draft, intermediate version, final version), +- the location (workstation PC, central file server, database), +- the availability time frame (short, project length, long-term), +- the format they are saved. ![small data handout](/img/data/en_data_handout.png) ### Technical implementations -Get an overview of the occurring data and document flows. A short description, easy to understand for every user, should be accessible on a low level and explain the main concepts. The data itself needs to have a bulletproof [Data Organisation](/docs/data_organisation) including [Metadata](/docs/metadata/) and suitable [Data Format Standards](/docs/format_standards/). Find out more about the usual [Best Practice](/docs/best_practice/) at your institute or within your discipline. Plan how documents can be shared between project staff and what needs to be accessed for [Data Publication](/docs/data_publishing/) with [PID](/docs/pid/). +Get an overview of the occurring data and document flows. A short description, easy to understand for every user, should be accessible on a low level and explain the main concepts. The data itself needs to have a bulletproof [Data Organisation](/docs/data_organisation), including [Metadata](/docs/metadata/) and suitable [Data Format Standards](/docs/format_standards/). Find out more about the usual [Best Practice](/docs/best_practice/) at your institute or within your discipline. Plan how documents can be shared between project staff and what needs to be accessed for [Data Publication](/docs/data_publishing/) with [PID](/docs/pid/). Data security affects all technical and organisational issues to protect the data from alteration, loss, and destruction. In this context, storage methods, backup procedures, necessary physical resources as well as automated and administrative routines must be planned and put in place. Ask local contacts or external experts about already established technologies for [Data Storage and Archiving](/docs/data_storage) as well as suitable [Repositories](/docs/repositories/). @@ -70,11 +70,11 @@ Good data documentation does not happen over night - take small steps first. The - Which devices or file formats are or have been used? - Are there any special features? - Awareness: Who produces (meta)data, and who continues to use data and how? -- Define internal rules and processes: What are the targets of RDM, and how can it be achieved? -- Apply and evaluate rules, iteratively: Learn, set, follow, repeat. Keep it simple and smart (KISS). -- Develop a suitable technology: Determine specific requirements in the first project phase and continuously adapt them to changing conditions. -- Establish supporting technology: Evaluate and test software like [ELN](/docs/eln/) and [Repositories](/docs/repositories/), train staff. -- Obtain legal advice, include local and higher-level policies and procedures: Contact legal department at your institution or [NFDI Querschnittssektion “Ethik und Recht”](https://www.nfdi.de/einrichtung-von-ersten-sektionen/) +- Define internal rules and processes: What are the targets of RDM, and how can they be achieved? +- Apply and evaluate iteratively rules: Learn, set, follow, repeat. Keep it simple and smart (KISS). +- Develop suitable technology: Determine specific requirements in the first project phase and adapt them continuously to changing conditions. +- Establish supporting technology: Evaluate and test software like [ELN](/docs/eln/) and [Repositories](/docs/repositories/); train your staff. +- Obtain legal advice, considering local and higher-level policies and procedures: Contact legal department at your institution or [NFDI Querschnittssektion “Ethik und Recht”](https://www.nfdi.de/einrichtung-von-ersten-sektionen/) - Make rules and decisions accessible to everyone at an early stage, for example in the form of a short handout. - Check the concept regularly and update it if necessary. diff --git a/docs/30_data/50_data_storage.mdx b/docs/30_data/50_data_storage.mdx index 9644f0ad..d799e8b4 100644 --- a/docs/30_data/50_data_storage.mdx +++ b/docs/30_data/50_data_storage.mdx @@ -6,25 +6,25 @@ nfdi4chem-tags: [data_organisation, data_storage, repositories] slug: "/data_storage" --- -If you plan to collect data and process it into information, you should consider different types of storage with regard to security, backup, access time and sharing with others. It is also of interest [how to estimate the computational resources for data processing and analysis](https://rdmkit.elixir-europe.org/storage.html#how-do-you-estimate-computational-resources-for-data-processing-and-analysis). There are different requirements for the entire [Data Life Cycle](/docs/data_life_cycle/). Regarding the workflows used in a project, care should also be taken when securing these workflows and tools (software version!) to ensure the reproducibility of results. +If you plan to collect data and process it into information, you should consider different types of storage with regard to security, backup, access time and sharing with others. It is also of interest [to estimate the computational resources for data processing and analysis](https://rdmkit.elixir-europe.org/storage.html#how-do-you-estimate-computational-resources-for-data-processing-and-analysis). There are different requirements for the entire [Data Life Cycle](/docs/data_life_cycle/). Regarding the workflows used in a project, care should also be taken when securing these workflows and tools (software version!) to ensure the reproducibility of results. ## Workflow perspective -Let's discuss different storage solutions along a possible workflow. Think of all possible data sources that provide data in your project, such as laboratory equipment (devices), manually collected data or external data from publications or project partners. Some devices may continuously automatically deliver data points while others regularly provide files for collection. Reduce the amount to the data points necessary for your project, consider possible pre-processing and estimate the data that will arise in terms of frequency and size. It is possible that data can already be processed while other data of the same type is still being recorded. At what point in the workflow is the data annotated by further metadata and does this possibly also work automatically? What descriptive documents are provided by human sources and when? +Let's discuss different storage solutions along a possible workflow. Think of all possible data sources that provide data in your project, such as laboratory equipment (devices), manually collected data or external data from publications or project partners. Some devices may continuously automatically deliver data points, while others regularly provide files for collection. Reduce the amount to the data points necessary for your project, consider possible pre-processing and estimate the data that will arise in terms of frequency and size. It is possible that a part of the data has already been processed, while other data of the same type is still being recorded. At what point in the workflow is the data annotated by further metadata, and does this possibly also work automatically? What descriptive documents are provided by human sources and when? -When planning [data management](/docs/dmp/), think about storage solutions and request short-term and long-term storage in advance. +In the [planning phase](/docs/dmp/) of a research activity, think about storage solutions and request short-term and long-term storage in advance. #### Necessary requirements when designing a storage system: - space requirements for collection or generation of raw data including temporary files ("fast storage") - space requirements for data that can be permanently accessed over the duration of the project -- access requirements to the data (in case of collaborative projects), how do they expect to access the data and for what purpose +- access requirements to the data (in case of collaborative projects): expected access ways and purpose - transfer speed requirements - sharing opportunities, guidelines for data sharing outside the institute, compliance and rights management -- "read-only" copy of the original raw data in a separate location (not editable) -- how long raw data, as well as data processing pipelines and analysis workflows need to be stored, especially after the end of the project +- "read-only" (not editable) copy of the original raw data in a separate location +- requirements on storage duration of raw data, as well as data processing pipelines and analysis workflows, especially after the end of the project - [metadata](/docs/metadata/): identifier and file description, associated with your data - requirements on version control to keep track of changes, conflict resolution, data mentoring and back-tracing capabilities -Involve the IT team of your home organisation, they can also provide advice on a tiered storage system: +Involve the IT team of your home organisation — they can also provide advice on a tiered storage system: - "hot" storage: fast access speed, high access frequency, high value data -> high cost - "cold" storage: low access speed and frequency, usually off-premises -> low cost - preservation solutions (data archiving services) @@ -40,19 +40,19 @@ The 3-2-1-0 rule: Why? Sometimes it's not a technical problem, but a "layer-8"-issue: human error. -### Ok, I'm lost - this is far from my business. +### Ok, I'm lost — this is far from my business. Many of the requirements are often solved by dedicated [repositories](/docs/repositories/). It is also worth taking a look at group drives or cloud services such as NextCloud (on-premises). Your local IT team and computing centre will help you with services that they usually support. But nevertheless: Make sure to generate good documentation (i.e., README file) and metadata together with the data. Check if your institute provides a (meta)data management system, such as iRODS, DataVerse, FAIRDOM-SEEK or OSF. -## Nirvana - your data in FAIR-paradise +## Nirvana — your data in the FAIR-paradise :::info Preservation > Relevant (meta)data (to guarantee reproducibility) should be preserved for a certain amount of time, that is usually defined by funders or institution policy. However, where to preserve data that are not needed for active processing or analysis anymore is a common question in data management. _see [RDMKit](https://rdmkit.elixir-europe.org/preserving)_ ::: -Documentation or conversion of files into long-term backup formats. The data-holding facility must for its part guarantee security, quality and availability. Consider any licence regulations or data protection of personal data when releasing it to the public. +Data documentation is complete; files are converted into long-term backup formats. The data-holding facility must for its part guarantee security, quality and availability. Consider any licence regulations or data protection of personal data when releasing it to the public. If you publish your data in public repositories, your data will also be preserved. @@ -61,6 +61,6 @@ If you publish your data in public repositories, your data will also be preserve - https://rdmkit.elixir-europe.org/storage.html - https://www.rdm.kit.edu/index.php - https://www.druva.com/glossary/what-is-data-archiving-definition-and-related-faqs/ -- German: https://www.researchgate.net/publication/221657547_Handbuch_Forschungsdatenmanagement -- German: https://www.degruyter.com/document/doi/10.1515/9783110657807/html -- German: https://handbuch.tib.eu/w/Lehrbuch_Forschungsdatenmanagement/_Druckversion +- https://www.researchgate.net/publication/221657547_Handbuch_Forschungsdatenmanagement (in German) +- https://www.degruyter.com/document/doi/10.1515/9783110657807/html (in German) +- https://handbuch.tib.eu/w/Lehrbuch_Forschungsdatenmanagement/_Druckversion (in German) diff --git a/docs/50_data_publication/00_data_publishing.mdx b/docs/50_data_publication/00_data_publishing.mdx index a8827f2e..f037dd6b 100644 --- a/docs/50_data_publication/00_data_publishing.mdx +++ b/docs/50_data_publication/00_data_publishing.mdx @@ -12,18 +12,18 @@ This page applies to all researchers who want to publish their data. In chemical research, we strive to share results with others, commonly through articles in renowned scientific journals. To be able to actually work with and build upon these results, the scientific community also requires the data that the results were based on. -Publishing and therefore sharing these chemistry research data in a [FAIR](/docs/fair) manner by considering aspects such as [persistent identifiers](/docs/pid), rich [metadata](/docs/metadata), [provenance information](/docs/provenance), [data formats standards](/docs/format_standards) for analytical data, information on the [license](/docs/licences) applied, and by providing [machine-readable chemical structures](/docs/machine-readable_chemical_structures) adds value to the research results and enables discovery and reuse. +Publishing and therefore sharing these chemistry research data in a [FAIR](/docs/fair) manner adds value to the research results and enables discovery and reuse. For this purpose it is important to consider aspects such as rich [metadata](/docs/metadata), [provenance information](/docs/provenance), information on the [license](/docs/licences) applied, [persistent identifiers](/docs/pid), [data formats standards](/docs/format_standards) for analytical data, and [machine-readable chemical structures](/docs/machine-readable_chemical_structures). -To publish data is essential to ensure that findings are transparent and reproducible. Moreover, it prevents duplicate efforts to generate data, hence, data publishing is also a measure of sustainability. +To publish data is essential to ensure that findings are transparent and reproducible. Moreover, it prevents duplicate efforts to generate data; hence, data publishing is also a measure of (ecological) sustainability. ## Benefits of data publishing -There are direct benefits for researchers who publish their data. Data publications increase your career recognition, enable new collaborations, and provide a citation advantage compared to articles without an associated and linked dataset in a research data repository. +There are direct benefits for researchers who publish their data. A data publication in a research data repository increases your career recognition, enables new collaborations, and provides a citation advantage compared to an article without an associated and linked dataset. ![Data Publication Brian Hole CC BY 4.0](/img/data_pub/data_publication_brian_hole_CC_BY_40.png) (Source: Brian Hole, [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/), [Slideshare](https://www.slideshare.net/brianhole/the-journal-of-open-archaeology-data-and-prime-incentivising-open-data-archiving)) -Additionally, the research community benefits here as, for example, new research is made possible and efficiency in research increases. Also the public sector will profit by enhanced public trust in science, as data publishing allows for the validation of research results. Nonetheless, the publication of data is also of economic benefit if the results can be reused by the private sector. +Additionally, the research community benefits here, as new research is made possible and efficiency in research increases. Also the public sector can profit by enhanced public trust in science, since data publishing allows for the validation of research results. Nonetheless, the publication of data also brings an economic benefit, if the results can be reused by the private sector. ## How to start @@ -33,12 +33,12 @@ There are two main ways to publish research data: - publish a separate [data article](/docs/data_articles/) with the corresponding dataset published in a [research data repository](/docs/repositories/) :::info Info: -Field-specific repositories should be the first choice as these repositories enhance the FAIRness of data on behalf of the submitters. To retain the same level of FAIRness, data publishing in generic repositories requires manual FAIRification. +Discipline-specific repositories should be the first choice, as these repositories enhance the FAIRness of data on behalf of the submitters. To retain the same level of FAIRness, data publishing in generic repositories requires manual FAIRification. ::: -[Smart Lab](/docs/smartlab) solutions, such as [Chemotion ELN](/docs/chemotion), can offer built-in workflows to assist researchers in publishing data. If the data is already documented in a structured way in Chemtotion ELN, then the data can also be published in this structured way via [Chemotion Repository](https://www.chemotion-repository.net). +[Smart Lab](/docs/smartlab) solutions, such as the [Chemotion ELN](/docs/chemotion), can offer built-in workflows to assist researchers in publishing data. If the data is already documented in a structured way in Chemotion ELN, then the data can also be published in this structured way via the [Chemotion Repository](https://www.chemotion-repository.net). -Further information on repositories, including a list of recommended chemistry-friendly repositories, is provided on our pages on [repositories](/docs/repositories) and guide on [how to choose the right repository](/docs/choose_repository). +Further information on repositories, including a list of recommended chemistry-friendly repositories, is provided on our pages on [repositories](/docs/repositories) and in the guide on [how to choose the right repository](/docs/choose_repository). A [data availability statement](/docs/data_availability_statement) in the back matter of a manuscript communicates how the data has been shared and how it can be accessed by others. Datasets and scientific publications should be interlinked using [persistent identifiers](/docs/pid). diff --git a/docs/50_data_publication/10_repositories.mdx b/docs/50_data_publication/10_repositories.mdx index 331b0e67..b58622ea 100644 --- a/docs/50_data_publication/10_repositories.mdx +++ b/docs/50_data_publication/10_repositories.mdx @@ -13,21 +13,21 @@ Research data repositories are locations where digital objects are stored and ma Repositories can be classified mainly according to: - the type of objects to be stored (e.g. scientific articles or research data) and -- the domain of the data contained (field-specific or generic repositories), +- the domain of the data contained (discipline-specific or generic repositories), Repositories can be hosted at institutional servers or are provided by broader organisations or consortia such as NFDI4Chem. The use of repositories is essential for data deposition according to the [FAIR Data Principles](/docs/fair). ## How do repositories work? -A repository is constituted by a repository software and a database. Researchers transfer their data to the repository typically via a browser-based user interface, and/or the repository operators **harvest** the data from other platforms via appropriate protocols and interfaces. +A repository consists of a repository software and an underlying database. Researchers transfer their data to the repository typically via a browser-based user interface, or the repository operators **harvest** the data from other platforms via appropriate protocols and interfaces. -Some, but not all, repositories curate and review the data before **ingestion** with regard to their content and quality, sometimes also regarding legal aspects (copyright, data protection, [licences](/docs/licences)). +Some, but not all, repositories curate and review the data before **ingestion** with regard to their content and quality, sometimes also regarding legal aspects (data protection, copyright/[licences](/docs/licences)). -In order to allow data reuse by other researchers, [metadata](/docs/metadata), including [provenance information](/docs/provenance/), are required beside the actual data. Metadata describe the research data and provide information about its creation, the methods or software used as well as legal aspects. Metadata can be either added manually via a metadata editor or can be provided through other applications. The process to manually add metadata via a metadata editor can be compared to the process of submitting a manuscript to a publisher via the publishers submission system. +In order to allow data reuse by other researchers, [metadata](/docs/metadata), including [provenance information](/docs/provenance/), are required beside the actual data. Metadata describe the research data and provide information about its creation, the methods or software used, as well as legal aspects. Metadata can be either added manually via a metadata editor, or can be provided through other applications. The process to manually add metadata via a metadata editor can be compared to the process of submitting a manuscript to a publisher via the publisher's submission system. -One main function of repositories is to provide a search function, with which users and machines can find, view, and download data. In order to ensure that data are permanently referenced and can be [linked and cited](/docs/best_practice/), repositories assign unique [persistent identifiers](/docs/pid) (PIDs). This also enhances the findability and accessibility of research data. +One main function of repositories is to provide a search function, with which human users and machines can find, view, and download data. In order to ensure that data can be permanently referenced, [linked and cited](/docs/best_practice/), repositories assign unique [persistent identifiers](/docs/pid) (PIDs). This also enhances the findability and accessibility of research data. -Repositories can also be certified (e.g. CoreTrustSeal). Such certification ensures that the data is citable, preserved in the long run, and may also cover aspects of data curation and data quality. +Repositories can also be certified (e.g. CoreTrustSeal). Such a certification ensures that the data is citable, preserved in the long run, and may also cover aspects of data curation and data quality. ## Finding the right repository @@ -41,11 +41,11 @@ To ease the selection of a suitable research data repository for chemistry resea ## Sources and further information -- [Repository Platforms for Research Data IG](https://www.rd-alliance.org/groups/repository-platforms-research-data/activity/) +- [RDA Repository Platforms for Research Data interest group](https://www.rd-alliance.org/groups/repository-platforms-research-data/activity/) - [FAIRsFAIR Repository Support Series: Using registries to improve the visibility of your repository service](https://www.dcc.ac.uk/events/fairsfair-repository-support-series-using-registries-improve-visibility-your-repository) - [The Repository Chemotion: infrastructure for sustainable research in chemistry](https://doi.org/10.1002/anie.202007702) - [Chemotion ELN: an open source electronic lab notebook for chemists in academia](https://doi.org/10.1186/s13321-017-0240-0) -- German: [Was ist ein Repositorium? Forschungsdaten.info](https://www.forschungsdaten.info/themen/veroeffentlichen-und-archivieren/repositorien/) +- [Was ist ein Repositorium?](https://www.forschungsdaten.info/themen/veroeffentlichen-und-archivieren/repositorien/) auf Forschungsdaten.info (in German) _This page is licensed under a Creative Commons Universal ([CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/deed.en)) Public Domain Dedication International License._ diff --git a/docs/60_topics/63_data_description_annotation/10_metadata.mdx b/docs/60_topics/63_data_description_annotation/10_metadata.mdx index 94540f8c..d5135a1b 100644 --- a/docs/60_topics/63_data_description_annotation/10_metadata.mdx +++ b/docs/60_topics/63_data_description_annotation/10_metadata.mdx @@ -6,50 +6,46 @@ slug: "/metadata" # Metadata and Minimum Information ## Metadata and their schemas -Metadata can be described as "data about data", meaning, it is data that describes data, like content of a dataset or file, or the context of this data. -More specific examples could be the title, keywords, acquisition method with a certain analytical technique, and the list continues. Metadata should be supported by controlled vocabularies (ideally [ontologies](/docs/ontology)), and/or [data formats](/docs/format_standards). +Metadata can be described as "data about data", i.e. structured information that describes data, like the content of a dataset or file, or the context of its generation. +Some exemplary metadata fields are: title, keywords, acquisition method / analytical technique, and the list continues. Metadata should be supported by controlled vocabularies (ideally [ontologies](/docs/ontology)), and/or [data formats](/docs/format_standards). -Metadata gets more specialized as the domain it describes does, where the hierarchy of domains can correspond to a hierarchical metadata structure, enabling layers of multiple standards from more generic metadata, where it is completely domain-independent, moving to more specific ones. +Metadata gets more specialized as the domain it describes does, where the hierarchy of domains can correspond to a hierarchical metadata structure: from more generic, completely domain-independent metadata layer, to the most method- and application-specific ones. ### Domain-Independent Metadata: Metadata can be domain-independent, focusing mostly on citation details, such as the title, the keywords, the people and institutions involved, or references to other data. Domain-independent metadata standards can be complemented by more domain-specific metadata. * [Dublin Core](https://www.dublincore.org/specifications/dublin-core/dces/) is a more general set of fifteen elements describing networked resources. This set has been adapted and extended by other standards since its first publication in 1995. - * [DataCite](https://datacite.org/) is a DOI provider that provides a [schema](https://schema.datacite.org/) of core metadata for research data. The standard is community driven and tries to integrate with other standards such as Dublin Core and [ORCID Record Schema](https://info.orcid.org/documentation/integration-guide/orcid-record/). -* [OpenAIRE Guidelines for Data Archive Managers](https://guidelines.openaire.eu/en/latest/) provides an infrastructure, which facilitates interoperability between repositories adhering to those guidelines, which enhance data exposure and visibility. OpenAIRE has already adopted the DataCite [schema](https://schema.datacite.org/) but with some minor adjustments, such as accepting other persistent identifier schemes rather than the DOI, and some changes in the obligations of properties. +* The [OpenAIRE Guidelines for Data Archive Managers](https://guidelines.openaire.eu/en/latest/) provide an infrastructure which facilitates interoperability between repositories adhering to those guidelines, which enhance data exposure and visibility. OpenAIRE has already adopted the DataCite [schema](https://schema.datacite.org/) but with some minor adjustments, such as accepting other persistent identifier schemes rather than the DOI, and some changes in the obligations of properties. * [PROV](https://www.w3.org/TR/prov-overview/): The W3C standard for provenance information can be used to provide information on the origin of scientific data. -* [Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)](http://www.openarchives.org/OAI/openarchivesprotocol.html) is a framework for harvesting metadata and can be applied to a wide variety of metadata formats. These should always include Dublin Core metadata. +* The [Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)](http://www.openarchives.org/OAI/openarchivesprotocol.html) is a framework for harvesting metadata and can be applied to a wide variety of metadata formats. These should always include Dublin Core metadata. ### Domain-Specific Metadata: -Metadata can be domain-specific, such as acquisition method with a certain analytical technique, or the pH for a certain reaction, which don’t apply to most other domains rather than chemistry. +Metadata can be domain-specific, i.e. related to a specific acquisition method with a certain analytical technique (such as a pH measurement in the context of a certain reaction), which doesn’t apply to most other domains rather than chemistry. -* [Core Scientific Metadata Model (CSMD)](http://icatproject-contrib.github.io/CSMD/) is a model for scientific studies, and it includes entity classes for facilities, users, investigations, instruments, datafiles, datasets, and samples. Within these classes most of the experimental parameters and results can be captured. Additionally, there are classes for e.g. publications, data formats, and sample types. Beside a publication of the specification as UML (Unified Modeling language) classes model definition, there is also a representation as an ontology. Future releases will focus on the integration of the [PROV](https://www.w3.org/TR/prov-overview/) model. -* [ISA (Investigation Study Assay)](https://isa-specs.readthedocs.io/en/latest/index.html) is also a metadata framework focusing on biological investigations, and it has schemas for the representation in data formats (ISA-Tab and JSON). It can be applied to many methods and allows the inclusion of ontology references for the entities. -* [IUPAC - FAIRSpec](https://github.com/IUPAC/IUPAC-FAIRSpec): It covers spectroscopic data including NMR spectroscopy. But this project is still preliminary and under development. +* The [Core Scientific Metadata Model (CSMD)](http://icatproject-contrib.github.io/CSMD/) is a model for scientific studies, which includes entity classes for facilities, users, investigations, instruments, datafiles, datasets, and samples. Within these classes most of the experimental parameters and results can be captured. There are additionally classes for e.g. publications, data formats, and sample types. Beside a publication of the specification as UML (Unified Modeling language) classes model definition, there is also a representation as an ontology. Future releases will focus on the integration of the [PROV](https://www.w3.org/TR/prov-overview/) model. +* The [Investigation Study Assay (ISA)](https://isa-specs.readthedocs.io/en/latest/index.html) is also a metadata framework focusing on biological investigations, which defines schemas for the data representation in machine-readable formats (ISA-Tab and JSON). It can be applied to many methods and allows the inclusion of ontology references for the entities. +* [IUPAC - FAIRSpec](https://github.com/IUPAC/IUPAC-FAIRSpec) is a framework under development at IUPAC, which aims to cover spectroscopic data including NMR spectroscopy. ## Minimum information standards (MI) -Minimum information standards (MI) are guidelines regarding which metadata is required when reporting data. Furthermore, these guidelines outline which format to use for both this information as well as for the data itself. The set of MI depends on the type of data and is established to ensure that data are deposited following the FAIR principles. Therefore, minimum information is a subset of rich metadata which can accompany data. +Minimum information standards (MI) are guidelines regarding which metadata is required when reporting data. Furthermore, these guidelines outline which format should be used for both this information as well as for the data itself. The set of MI depends on the type of data and is established to ensure that data are deposited following the FAIR principles. Therefore, minimum information is a subset of rich metadata which can accompany data. ### Minimum Information for Chemical Investigations (MIChI) -Due to the increasing amount of data produced by omics, biology and related disciplines, such as bioinformatics and biochemistry, have developed a large set of [minimum information guidelines](https://fairsharing.org/search/?q=minimum+information) for different methods. These were promoted by the [Minimum Information for Biological and Biomedical Investigations (MIBBI)](https://doi.org/10.1038/nbt.1411) project. +Due to the increasing amount of data produced by biology and related disciplines, such as omics, bioinformatics and biochemistry, a large set of [minimum information guidelines](https://fairsharing.org/search/?q=minimum+information) for different methods has been developed. These were promoted by the [Minimum Information for Biological and Biomedical Investigations (MIBBI)](https://doi.org/10.1038/nbt.1411) project. -Although the explored part of the chemical space along with the chemical data produced is increasing rapidly, there are only a few attempts to define guidelines for minimum information in chemistry, e.g. [Metabolomics Standards Initiative (MSI)](https://dx.doi.org/10.1007%2Fs11306-007-0082-2) or [Collaboratory for the Multi-scale Chemical Sciences (CMCS)](https://www.researchgate.net/publication/228602526_Metadata_in_the_collaboratory_for_multi-scale_chemical_science). The NFDI4Chem will address this issue and is working on Minimum Information for Chemical Investigations (MIChI), which includes standards for methods such as mass spectrometry, nuclear magnetic resonance and other spectroscopic methods. International workshops are already being carried out in order to start the needed discussion about the MIChI. +Although the explored part of the chemical space along with the chemical data produced is increasing rapidly, there are only a few attempts to define guidelines for minimum information in chemistry, e.g. the [Metabolomics Standards Initiative (MSI)](https://dx.doi.org/10.1007%2Fs11306-007-0082-2) or the [Collaboratory for the Multi-scale Chemical Sciences (CMCS)](https://www.researchgate.net/publication/228602526_Metadata_in_the_collaboratory_for_multi-scale_chemical_science). NFDI4Chem will address this issue by preparing recommendations on **Minimum Information for Chemical Investigations (MIChI)**, which include standards for methods such as mass spectrometry, nuclear magnetic resonance and optical spectroscopic methods. International workshops are already being carried out in order to start the needed discussion about the MIChI. Software projects such as [electronic lab notebooks](/docs/eln) or [repositories](/docs/repositories) often define their own layer of specific minimum metadata for chemical experiments which are based on existing standards, e.g. for metabolomics, or defined by the [data formats](/docs/format_standards) they import. -Existing [ontologies](/docs/ontology) are a good starting point to identify the information necessary to describe a method, results, samples, or other entities. Furthermore, controlled vocabularies and ontologies define what additional metadata is allowed in order to create rich metadata, in turn improving the data's FAIRness. Examples for formats with corresponding ontologies or a controlled vocabularies are [mzML](https://www.psidev.info/mzML), [CIF](https://www.iucr.org/resources/cif), [NeXus](https://www.nexusformat.org/), -and the -[Allotrope Data Format (ADF)](https://docs.allotrope.org/Allotrope%20Data%20Format.html). +Existing [ontologies](/docs/ontology) are a good starting point to identify the information necessary to describe a method, results, samples, or other entities. Furthermore, controlled vocabularies and ontologies define what additional metadata is allowed in order to create rich metadata, in turn improving the data's FAIRness. Examples for formats with corresponding ontologies or a controlled vocabularies are [mzML](https://www.psidev.info/mzML), [CIF](https://www.iucr.org/resources/cif), [NeXus](https://www.nexusformat.org/), and the [Allotrope Data Format (ADF)](https://docs.allotrope.org/Allotrope%20Data%20Format.html). -[The Chemical Analysis Metadata Platform (ChAMP)](https://champ.stuchalk.domains.unf.edu/) is a project, which focuses on defining a framework for chemical analysis methods. +[The Chemical Analysis Metadata Platform (ChAMP)](https://champ.stuchalk.domains.unf.edu/) is a project which focuses on defining a framework for chemical analysis methods. ## Metadata and the FAIR Principles The [FAIR Guiding Principles](https://doi.org/10.1038/sdata.2016.18) do not only apply to data but also the associated metadata. More information can be found in the [FAIR article](/docs/fair) or on [GoFair](https://www.go-fair.org/how-to-go-fair/). Metadata, as well as the data itself, should be assigned unique [persistent identifiers (PID)](/docs/pid) to be referenced in publications and other datasets. By organizing these PIDs hierarchically, each parameter in the metadata can be referenced individually. -Machine-readable metadata should be provided in a standardized format, while the metadata entities should be well-documented regarding semantics and the relations between the entities and the actual data. This can be achieved by defining the metadata as an ontology or schema, e.g. as XML or JSON. Schemas help in indexing metadata for search engines, repositories, or other data registries, and also help improve interoperability—the I in FAIR. Most of the other FAIR guidelines also apply to metadata. - +Machine-readable metadata should be provided in a standardized format, while the metadata entities should be well-documented regarding semantics and the relations between the entities and the actual data. This can be achieved by defining the metadata as an ontology, or a schema in a machine-readable serialization, such as XML or JSON. Schemas help in indexing metadata for search engines, repositories, or other data registries, and also help improve interoperability (the I in FAIR). Most of the other FAIR guidelines also apply to metadata. ## Sources and further information -A short introductory video to Metadata can be found [here](https://www.youtube.com/embed/TnpDnflK66I) (in German): +A short introductory video to Metadata (in German) can be found [here](https://www.youtube.com/embed/TnpDnflK66I). From 03ef3b03cf9a46c6653ca52ed5cfc2594ff54771 Mon Sep 17 00:00:00 2001 From: Johannes Liermann Date: Fri, 19 Jun 2026 15:41:05 +0200 Subject: [PATCH 6/8] fix: link Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --- docs/60_topics/63_data_description_annotation/10_metadata.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/60_topics/63_data_description_annotation/10_metadata.mdx b/docs/60_topics/63_data_description_annotation/10_metadata.mdx index 1e3fca3e..d64895f8 100644 --- a/docs/60_topics/63_data_description_annotation/10_metadata.mdx +++ b/docs/60_topics/63_data_description_annotation/10_metadata.mdx @@ -7,7 +7,7 @@ slug: "/metadata" ## Metadata and their schemas Metadata can be described as "data about data", i.e. structured information that describes data, like the content of a dataset or file, or the context of its generation. -Some exemplary metadata fields are: title, keywords, acquisition method / analytical technique, and the list continues. Metadata should be supported by controlled vocabularies (ideally [ontologies](/docs/ontology)), and/or [data formats](/docs/format_standards). +Some exemplary metadata fields are: title, keywords, acquisition method / analytical technique, and the list continues. Metadata should be supported by controlled vocabularies (ideally [ontologies](/docs/ontology)), and/or [data formats](/docs/data_formats). Metadata gets more specialized as the domain it describes does, where the hierarchy of domains can correspond to a hierarchical metadata structure: from more generic, completely domain-independent metadata layer, to the most method- and application-specific ones. From 9b73b9afb5f483751a599f4489da43ea3fc0d7ff Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 19 Jun 2026 13:45:30 +0000 Subject: [PATCH 7/8] Initial plan From d7b0340614b0474bbd58e4e30ef11c7f66f96adf Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 19 Jun 2026 13:51:24 +0000 Subject: [PATCH 8/8] fix: misc. typos and grammatical issues from PR #520 review --- docs/30_data/30_data_organisation.mdx | 8 ++++---- docs/30_data/40_data_documentation.mdx | 12 ++++++------ docs/50_data_publication/00_data_publishing.mdx | 2 +- docs/50_data_publication/10_repositories.mdx | 6 +++--- .../63_data_description_annotation/10_metadata.mdx | 8 ++++---- 5 files changed, 18 insertions(+), 18 deletions(-) diff --git a/docs/30_data/30_data_organisation.mdx b/docs/30_data/30_data_organisation.mdx index 602b1a22..43062422 100644 --- a/docs/30_data/30_data_organisation.mdx +++ b/docs/30_data/30_data_organisation.mdx @@ -55,13 +55,13 @@ A good file name such as `20180211_ELI5_TEMP_BH01_RAW_03.csv` can easily be sort - **type of data:** RAW = raw data from measuring device - **number of file:** containing data for that measurement series -If you need to rename a multiple files, take a look at: +If you need to rename multiple files, take a look at: - [Thunar Bulk Rename](https://docs.xfce.org/xfce/thunar/bulk-renamer/start) (Linux, GUI) - [command line: mv, mmv, rename](https://linuxconfig.org/how-to-rename-multiple-files-on-linux) (Linux, CLI) - [Bulk Rename Utility](https://www.bulkrenameutility.co.uk/) (Windows, free) - [A.F.5 Rename your files](http://fauland.com/download.htm) (Windows, free) -- [TotalCommander](https://www.ghisler.com/advanced.htm#tutorial_rename) (windows, Shareware) +- [TotalCommander](https://www.ghisler.com/advanced.htm#tutorial_rename) (Windows, Shareware) - [Renamer4Mac](https://renamer.com/) (Mac). For some special file formats there are tools for adapting the file name to the metadata. For example, to create a file name that fits your scheme and takes date and time information from the EXIF data of a jpg file. Some also allow adding an offset - this helps sort photos into timestamps that run on different clocks. @@ -85,11 +85,11 @@ Consider the way data and [Metadata](/docs/metadata/) can be stored together as - technical metadata - structural metadata -An FDO encapsulates data and metadata in one file and can be saved as an [HDF5](https://www.hdfgroup.org/solutions/hdf5/), for example. See [Data Format Standards](/docs/format_standards/) for more information. +An FDO encapsulates data and metadata in one file and can be saved as an [HDF5](https://www.hdfgroup.org/solutions/hdf5/), for example. See [Data Format Standards](/docs/data_formats/) for more information. ## Files: formats -Different disciplines use established standards, see [Data Format Standards](/docs/format_standards/). Also consider beyond the duration of the project: +Different disciplines use established standards, see [Data Format Standards](/docs/data_formats/). Also consider beyond the duration of the project: - usage of proprietary or open file formats - exchange within and outside of the working group diff --git a/docs/30_data/40_data_documentation.mdx b/docs/30_data/40_data_documentation.mdx index ba126662..6dda2a0d 100644 --- a/docs/30_data/40_data_documentation.mdx +++ b/docs/30_data/40_data_documentation.mdx @@ -25,7 +25,7 @@ _Andreas von der Dunk, Technische Universität Dresden, Service Center Research A clean and comprehensible organisation of data and documents are an important part of good research practice and an important step to realise research data management according to the [FAIR data principles](/docs/fair). -An essential task is defining in advance the organisation and storage of data, documents, and their [metadata](/docs/metadata), and to document the relevant measures. +An essential task is defining in advance the organisation and storage of data, documents, and their [metadata](/docs/metadata), and documenting the relevant measures. Central requirements are the definition of [formal responsibilities, organisational conventions](#formal-responsibilities-and-organisational-conventions) and [technical implementations](#technical-implementations) to organise the data and meta information produced. The information collected for this purpose is recorded in the [Data Management Plan](/docs/dmp). Note that a good data organisation also estimates the costs (see [costing tool and checklist](https://ukdataservice.ac.uk/learning-hub/research-data-management/plan-to-share/costing/) for example) in the early application phase. @@ -38,18 +38,18 @@ An essential part of the data documentation is the definition of the actors work - Describing processes of quality assurance including protected storage, sharing, and accessibility in the short term and on the long run. - Data processing: How, where, how fast. Description of input and output data; decisions on naming and structuring conventions for files and folders. -The result should be a set of descriptive documents associated to the files used during the daily work routines, which unambiguously determine: +The result should be a set of descriptive documents associated with the files used during the daily work routines, which unambiguously determine: - the status (for example original file, temporary work file; Draft, intermediate version, final version), - the location (workstation PC, central file server, database), - the availability time frame (short, project length, long-term), -- the format they are saved. +- the format in which they are saved. ![small data handout](/img/data/en_data_handout.png) ### Technical implementations -Get an overview of the occurring data and document flows. A short description, easy to understand for every user, should be accessible on a low level and explain the main concepts. The data itself needs to have a bulletproof [Data Organisation](/docs/data_organisation), including [Metadata](/docs/metadata/) and suitable [Data Format Standards](/docs/format_standards/). Find out more about the usual [Best Practice](/docs/best_practice/) at your institute or within your discipline. Plan how documents can be shared between project staff and what needs to be accessed for [Data Publication](/docs/data_publishing/) with [PIDs](/docs/pid/). +Get an overview of the occurring data and document flows. A short description, easy to understand for every user, should be accessible on a low level and explain the main concepts. The data itself needs to have a bulletproof [Data Organisation](/docs/data_organisation), including [Metadata](/docs/metadata/) and suitable [Data Format Standards](/docs/data_formats/). Find out more about the usual [Best Practice](/docs/best_practice/) at your institute or within your discipline. Plan how documents can be shared between project staff and what needs to be accessed for [Data Publication](/docs/data_publishing/) with [PIDs](/docs/pid/). Data security affects all technical and organisational issues to protect the data from alteration, loss, and destruction. In this context, storage methods, backup procedures, necessary physical resources as well as automated and administrative routines must be planned and put in place. Ask local contacts or external experts about already established technologies for [Data Storage and Archiving](/docs/data_storage) as well as suitable [Repositories](/docs/repositories/). @@ -71,10 +71,10 @@ Good data documentation does not happen over night - take small steps first. The - Are there any special features? - Awareness: Who produces (meta)data, and who continues to use data and how? - Define internal rules and processes: What are the targets of RDM, and how can they be achieved? -- Apply and evaluate iteratively rules: Learn, set, follow, repeat. Keep it simple and smart (KISS). +- Apply and evaluate rules iteratively: Learn, set, follow, repeat. Keep it simple and smart (KISS). - Develop suitable technology: Determine specific requirements in the first project phase and adapt them continuously to changing conditions. - Establish supporting technology: Evaluate and test software like [ELN](/docs/eln/) and [Repositories](/docs/repositories/); train your staff. -- Obtain legal advice, considering local and higher-level policies and procedures: Contact legal department at your institution or [NFDI Querschnittssektion “Ethik und Recht”](https://www.nfdi.de/einrichtung-von-ersten-sektionen/) +- Obtain legal advice, considering local and higher-level policies and procedures: Contact the legal department at your institution or [NFDI Querschnittssektion "Ethik und Recht"](https://www.nfdi.de/einrichtung-von-ersten-sektionen/) - Make rules and decisions accessible to everyone at an early stage, for example in the form of a short handout. - Check the concept regularly and update it if necessary. diff --git a/docs/50_data_publication/00_data_publishing.mdx b/docs/50_data_publication/00_data_publishing.mdx index f037dd6b..ae264c73 100644 --- a/docs/50_data_publication/00_data_publishing.mdx +++ b/docs/50_data_publication/00_data_publishing.mdx @@ -12,7 +12,7 @@ This page applies to all researchers who want to publish their data. In chemical research, we strive to share results with others, commonly through articles in renowned scientific journals. To be able to actually work with and build upon these results, the scientific community also requires the data that the results were based on. -Publishing and therefore sharing these chemistry research data in a [FAIR](/docs/fair) manner adds value to the research results and enables discovery and reuse. For this purpose it is important to consider aspects such as rich [metadata](/docs/metadata), [provenance information](/docs/provenance), information on the [license](/docs/licences) applied, [persistent identifiers](/docs/pid), [data formats standards](/docs/format_standards) for analytical data, and [machine-readable chemical structures](/docs/machine-readable_chemical_structures). +Publishing and therefore sharing these chemistry research data in a [FAIR](/docs/fair) manner adds value to the research results and enables discovery and reuse. For this purpose it is important to consider aspects such as rich [metadata](/docs/metadata), provenance information, information on the [license](/docs/licences) applied, [persistent identifiers](/docs/pid), [data formats standards](/docs/data_formats) for analytical data, and [machine-readable chemical structures](/docs/machine-readable_chemical_structures). To publish data is essential to ensure that findings are transparent and reproducible. Moreover, it prevents duplicate efforts to generate data; hence, data publishing is also a measure of (ecological) sustainability. diff --git a/docs/50_data_publication/10_repositories.mdx b/docs/50_data_publication/10_repositories.mdx index 870c2276..afa46395 100644 --- a/docs/50_data_publication/10_repositories.mdx +++ b/docs/50_data_publication/10_repositories.mdx @@ -19,11 +19,11 @@ Repositories can be hosted at institutional servers or are provided by broader o ## How do repositories work? -A repository consists of a repository software and an underlying database. Researchers transfer their data to the repository typically via a browser-based user interface, or the repository operators **harvest** the data from other platforms via appropriate protocols and interfaces. +A repository consists of repository software and an underlying database. Researchers transfer their data to the repository typically via a browser-based user interface, or the repository operators **harvest** the data from other platforms via appropriate protocols and interfaces. -Some, but not all, repositories curate and review the data before **ingestion** with regard to their content and quality, sometimes also regarding legal aspects (data protection, copyright/[licences](/docs/licences)). +Some, but not all, repositories curate and review the data before **ingestion** with regard to their content and quality, sometimes also regarding legal aspects (data protection, copyright and [licences](/docs/licences)). -In order to allow data reuse by other researchers, [metadata](/docs/metadata), including [provenance information](/docs/provenance/), are required beside the actual data. Metadata describe the research data and provide information about its creation, the methods or software used, as well as legal aspects. Metadata can be either added manually via a metadata editor, or can be provided through other applications. The process to manually add metadata via a metadata editor can be compared to the process of submitting a manuscript to a publisher via the publisher's submission system. +In order to allow data reuse by other researchers, [metadata](/docs/metadata), including provenance information, are required beside the actual data. Metadata describe the research data and provide information about its creation, the methods or software used, as well as legal aspects. Metadata can be either added manually via a metadata editor, or can be provided through other applications. The process to manually add metadata via a metadata editor can be compared to the process of submitting a manuscript to a publisher via the publisher's submission system. One main function of repositories is to provide a search function, with which human users and machines can find, view, and download data. In order to ensure that data can be permanently referenced, [linked and cited](/docs/best_practice/), repositories assign unique [persistent identifiers](/docs/pid) (PIDs). This also enhances the findability and accessibility of research data. diff --git a/docs/60_topics/63_data_description_annotation/10_metadata.mdx b/docs/60_topics/63_data_description_annotation/10_metadata.mdx index d64895f8..f9608105 100644 --- a/docs/60_topics/63_data_description_annotation/10_metadata.mdx +++ b/docs/60_topics/63_data_description_annotation/10_metadata.mdx @@ -9,7 +9,7 @@ slug: "/metadata" Metadata can be described as "data about data", i.e. structured information that describes data, like the content of a dataset or file, or the context of its generation. Some exemplary metadata fields are: title, keywords, acquisition method / analytical technique, and the list continues. Metadata should be supported by controlled vocabularies (ideally [ontologies](/docs/ontology)), and/or [data formats](/docs/data_formats). -Metadata gets more specialized as the domain it describes does, where the hierarchy of domains can correspond to a hierarchical metadata structure: from more generic, completely domain-independent metadata layer, to the most method- and application-specific ones. +Metadata gets more specialized as the domain it describes does, where the hierarchy of domains can correspond to a hierarchical metadata structure: from a more generic, completely domain-independent metadata layer, to the most method- and application-specific ones. ### Domain-Independent Metadata: @@ -17,12 +17,12 @@ Metadata can be domain-independent, focusing mostly on citation details, such as * [Dublin Core](https://www.dublincore.org/specifications/dublin-core/dces/) is a more general set of fifteen elements describing networked resources. This set has been adapted and extended by other standards since its first publication in 1995. * [DataCite](https://datacite.org/) is a DOI provider that provides a [schema](https://schema.datacite.org/) of core metadata for research data. The standard is community driven and tries to integrate with other standards such as Dublin Core and [ORCID Record Schema](https://info.orcid.org/documentation/integration-guide/orcid-record/). -* The [OpenAIRE Guidelines for Data Archive Managers](https://guidelines.openaire.eu/en/latest/) provide an infrastructure which facilitates interoperability between repositories adhering to those guidelines, which enhance data exposure and visibility. OpenAIRE has already adopted the DataCite [schema](https://schema.datacite.org/) but with some minor adjustments, such as accepting other persistent identifier schemes rather than the DOI, and some changes in the obligations of properties. +* The [OpenAIRE Guidelines for Data Archive Managers](https://guidelines.openaire.eu/en/latest/) provide an infrastructure which facilitates interoperability between repositories adhering to those guidelines and enhances data exposure and visibility. OpenAIRE has already adopted the DataCite [schema](https://schema.datacite.org/) but with some minor adjustments, such as accepting other persistent identifier schemes rather than the DOI, and some changes in the obligations of properties. * [PROV](https://www.w3.org/TR/prov-overview/): The W3C standard for provenance information can be used to provide information on the origin of scientific data. * The [Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)](http://www.openarchives.org/OAI/openarchivesprotocol.html) is a framework for harvesting metadata and can be applied to a wide variety of metadata formats. These should always include Dublin Core metadata. ### Domain-Specific Metadata: -Metadata can be domain-specific, i.e. related to a specific acquisition method with a certain analytical technique (such as a pH measurement in the context of a certain reaction), which doesn’t apply to most other domains rather than chemistry. +Metadata can be domain-specific, i.e. related to a specific acquisition method with a certain analytical technique (such as a pH measurement in the context of a certain reaction), which doesn't apply to most other domains other than chemistry. * The [Core Scientific Metadata Model (CSMD)](http://icatproject-contrib.github.io/CSMD/) is a model for scientific studies, which includes entity classes for facilities, users, investigations, instruments, datafiles, datasets, and samples. Within these classes most of the experimental parameters and results can be captured. There are additionally classes for e.g. publications, data formats, and sample types. Beside a publication of the specification as UML (Unified Modeling language) classes model definition, there is also a representation as an ontology. Future releases will focus on the integration of the [PROV](https://www.w3.org/TR/prov-overview/) model. * The [Investigation Study Assay (ISA)](https://isa-specs.readthedocs.io/en/latest/index.html) is also a metadata framework focusing on biological investigations, which defines schemas for the data representation in machine-readable formats (ISA-Tab and JSON). It can be applied to many methods and allows the inclusion of ontology references for the entities. @@ -32,7 +32,7 @@ Metadata can be domain-specific, i.e. related to a specific acquisition method w Minimum information standards (MI) are guidelines regarding which metadata is required when reporting data. Furthermore, these guidelines outline which format should be used for both this information as well as for the data itself. The set of MI depends on the type of data and is established to ensure that data are deposited following the FAIR principles. Therefore, minimum information is a subset of rich metadata which can accompany data. ### Minimum Information for Chemical Investigations (MIChI) -Due to the increasing amount of data produced by biology and related disciplines, such as omics, bioinformatics and biochemistry, a large set of [minimum information guidelines](https://fairsharing.org/search/?q=minimum+information) for different methods has been developed. These were promoted by the [Minimum Information for Biological and Biomedical Investigations (MIBBI)](https://doi.org/10.1038/nbt.1411) project. +Due to the increasing amount of data produced by biology and related disciplines, such as omics, bioinformatics and biochemistry, a large set of [minimum information guidelines](https://fairsharing.org/search/?q=minimum+information) for different methods has been developed. These were promoted by the [Minimum Information for Biological and Biomedical Investigations (MIBBI)](https://doi.org/10.1038/nbt.1411) project. Although the explored part of the chemical space along with the chemical data produced is increasing rapidly, there are only a few attempts to define guidelines for minimum information in chemistry, e.g. the [Metabolomics Standards Initiative (MSI)](https://dx.doi.org/10.1007%2Fs11306-007-0082-2) or the [Collaboratory for the Multi-scale Chemical Sciences (CMCS)](https://www.researchgate.net/publication/228602526_Metadata_in_the_collaboratory_for_multi-scale_chemical_science). NFDI4Chem will address this issue by preparing recommendations on **Minimum Information for Chemical Investigations (MIChI)**, which include standards for methods such as mass spectrometry, nuclear magnetic resonance and optical spectroscopic methods. International workshops are already being carried out in order to start the needed discussion about the MIChI.