Skip to content
37 changes: 20 additions & 17 deletions docs/30_data/30_data_organisation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,11 @@ Find a balanced set of elements: Too many make it difficult to grasp quickly, wh

:::note General basics for naming files:

- Order the elements from general to specific.
- Use meaningful abbreviations instead of long identifiers.
- Use underscore `_`, hyphen `-` or capitalized letters to separate elements in the name. Don’t use spaces or special characters: `?!&,_%#;_()@$^~‘{}[]<>`.
- Use date format ISO8601: `YYYYMMDD`, and time if needed `HHMMSS`.
- Include a version number if appropriate: minimum two digits (V02) and extend it, if needed for minor corrections (V02-03). The leading zeros, will ensure the files are sorted correctly.
- Order the elements from general to specific.
- Use meaningful abbreviations instead of long identifiers.
- Use underscore `_`, hyphen `-` or capitalized letters to separate elements in the name. Don’t use spaces or special characters: `?!&,%#;()@$^~‘{}[]<>`.
- Use date format ISO8601: `YYYYMMDD`, and time if needed `HHMMSS`.
- Include a version number if appropriate: minimum two digits (V02) and extend it, if needed for minor corrections (V02-03). The leading zeros, will ensure the files are sorted correctly.

(by [RDMKit](https://rdmkit.elixir-europe.org/data_organisation.html#what-is-the-best-way-to-name-a-file))
:::
Expand All @@ -55,25 +55,26 @@ A good file name such as `20180211_ELI5_TEMP_BH01_RAW_03.csv` can easily be sort
- **type of data:** RAW = raw data from measuring device
- **number of file:** containing data for that measurement series

If you need to rename a multiple files, take a look at:
If you need to rename multiple files, take a look at:

- [Thunar Bulk Rename](https://docs.xfce.org/xfce/thunar/bulk-renamer/start) (Linux, GUI)
- [command line: mv, mmv, rename](https://linuxconfig.org/how-to-rename-multiple-files-on-linux) (Linux, CLI)
- [Bulk Rename Utility](https://www.bulkrenameutility.co.uk/) (Windows, free)
- [TotalCommander](https://www.ghisler.com/advanced.htm#tutorial_rename) (windows, Shareware)
- [Renamer4Mac](https://renamer.com/) (Mac).
- [Thunar Bulk Rename](https://docs.xfce.org/xfce/thunar/bulk-renamer/start) (Linux, GUI)
- [command line: mv, mmv, rename](https://linuxconfig.org/how-to-rename-multiple-files-on-linux) (Linux, CLI)
- [Bulk Rename Utility](https://www.bulkrenameutility.co.uk/) (Windows, free)
- [A.F.5 Rename your files](http://fauland.com/download.htm) (Windows, free)
- [TotalCommander](https://www.ghisler.com/advanced.htm#tutorial_rename) (Windows, Shareware)
- [Renamer4Mac](https://renamer.com/) (Mac).

For some special file formats there are tools for adapting the file name to metadata. For example, to create a file name that fits your scheme and takes date and time information from the EXIF data of a jpg file. Some also allow adding an offset - this helps sort photos into timestamps that run on different clocks.
For some special file formats there are tools for adapting the file name to the metadata. For example, to create a file name that fits your scheme and takes date and time information from the EXIF data of a jpg file. Some also allow adding an offset - this helps sort photos into timestamps that run on different clocks.

## Files: versioning

Stash snapshots or simply track changes and allow to find something that existed in a previous version but was later deleted or changed. With a clear chronological processing one after the other, normally no further tools are required. But even if this often seems so at first, supporting tools quickly proved to be useful and quickly are established in cooperation.

#### Possible Solutions

- Low number of requirements: manage manually e.g. by keeping a log where the changes for each respective file is documented, version by version.
- For automatic management of versioning, conflict resolution and back-tracing capabilities, use a proper version control software such as [Git](https://git-scm.com/), hosted by e.g. [GitHub](https://github.com/) or your home institution. Very strong with uncompressed, readable and comparable files such as text files or csv.
- Use a Cloud Storage service (see [Data Storage and Archiving](/docs/data_storage)) that provides automatic file versioning. Very strong on spreadsheets, text files and slides.
- Low number of requirements: manage manually e.g. by keeping a log where the changes for each respective file is documented, version by version.
- For automatic management of versioning, conflict resolution and back-tracing capabilities, use a proper version control software such as [Git](https://git-scm.com/), hosted either remotely on a portal such as Codeberg, Forgejo, SourceHut, or at your home institution (most usually GitLab). Very strong with uncompressed, readable and comparable files such as text files or csv.
- Use a Cloud Storage service (see [Data Storage and Archiving](/docs/data_storage)) that provides automatic file versioning. Very strong on spreadsheets, text files and slides.

## Files: types of metadata

Expand All @@ -84,11 +85,11 @@ Consider the way data and [Metadata](/docs/metadata/) can be stored together as
- technical metadata
- structural metadata

An FDO encapsulates data and metadata in one file and can be saved as an [HDF5](https://www.hdfgroup.org/solutions/hdf5/), for example. See [Data Format Standard](/docs/data_formats/) for more information.
An FDO encapsulates data and metadata in one file and can be saved as an [HDF5](https://www.hdfgroup.org/solutions/hdf5/), for example. See [Data Format Standards](/docs/data_formats/) for more information.

## Files: formats

Different disciplines use established standards, see [Data Format Standard](/docs/data_formats/). Also consider beyond the duration of the project:
Different disciplines use established standards, see [Data Format Standards](/docs/data_formats/). Also consider beyond the duration of the project:

- usage of proprietary or open file formats
- exchange within and outside of the working group
Expand All @@ -111,6 +112,7 @@ The top folder should have a README.txt file describing the folder structure and

#### An example by [RDMKit](https://rdmkit.elixir-europe.org/data_organisation.html#what-is-the-best-way-to-name-a-file):

```
project/
code/ code needed to go from input files to final results
data/ raw and primary data (never edit!)
Expand All @@ -127,6 +129,7 @@ The top folder should have a README.txt file describing the folder structure and
tables/
scratch/ temporary files that can safely be deleted or lost
README.txt file and folder description
```

## Sources and further information

Expand Down
30 changes: 15 additions & 15 deletions docs/30_data/40_data_documentation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -25,31 +25,31 @@ _Andreas von der Dunk, Technische Universität Dresden, Service Center Research

A clean and comprehensible organisation of data and documents are an important part of good research practice and an important step to realise research data management according to the [FAIR data principles](/docs/fair).

An essential task is to plan the organisation and storage of data, [metadata](/docs/metadata), and documents in advance and to document the relevant measures.
An essential task is defining in advance the organisation and storage of data, documents, and their [metadata](/docs/metadata), and documenting the relevant measures.

Central requirements are the definition of [formal responsibilities, organisational conventions](#formal-responsibilities-and-organisational-conventions) and [technical implementations](#technical-implementations) to organise the data and meta information produced. The information collected for this purpose is recorded in the [Data Management Plan](/docs/dmp). Note that a good data organisation also estimates the costs (see [costing tool and checklist](https://ukdataservice.ac.uk/learning-hub/research-data-management/plan-to-share/costing/) for example) in the early application phase.

### Formal responsibilities and organisational conventions

To make a long story short: It's mainly about describing who will and how to work with documents. First steps could be about:
An essential part of the data documentation is the definition of the actors working with the data and the procedures. These are possible first steps:

- Documenting the responsibilities of primary researchers and project staff.
- Creating user roles: Define detailed rights for users / groups / roles to access data and sensitive information.
- Creating user roles: Defining detailed rights for users / groups / roles to access data and sensitive information.
- Describing processes of quality assurance including protected storage, sharing, and accessibility in the short term and on the long run.
- Data processing: How, where, how fast. Describe input and output data. Decide how you will name and structure files and folders.
- Data processing: How, where, how fast. Description of input and output data; decisions on naming and structuring conventions for files and folders.

The result should be descriptive documents that unambiguously define for files which are used in the course of the daily work routines:
The result should be a set of descriptive documents associated with the files used during the daily work routines, which unambiguously determine:

- on which status (for example original file, temporary work file; Draft, intermediate version, final version),
- where (workstation PC, central file server, database),
- for how long (temporarily, project duration, long-term availability),
- and in which format they are saved.
- the status (for example original file, temporary work file; Draft, intermediate version, final version),
- the location (workstation PC, central file server, database),
- the availability time frame (short, project length, long-term),
- the format in which they are saved.

![small data handout](/img/data/en_data_handout.png)

### Technical implementations

Get an overview of the occurring data and document flows. A short description, easy to understand for every user, should be accessible on a low level and explain the main concepts. The data itself needs to have a bulletproof [Data Organisation](/docs/data_organisation) including [Metadata](/docs/metadata/) and suitable [Data Format Standards](/docs/data_formats/). Find out more about the usual [Best Practice](/docs/best_practice/) at your institute or within your discipline. Plan how documents can be shared between project staff and what needs to be accessed for [Data Publication](/docs/data_publishing/) with [PID](/docs/pid/).
Get an overview of the occurring data and document flows. A short description, easy to understand for every user, should be accessible on a low level and explain the main concepts. The data itself needs to have a bulletproof [Data Organisation](/docs/data_organisation), including [Metadata](/docs/metadata/) and suitable [Data Format Standards](/docs/data_formats/). Find out more about the usual [Best Practice](/docs/best_practice/) at your institute or within your discipline. Plan how documents can be shared between project staff and what needs to be accessed for [Data Publication](/docs/data_publishing/) with [PIDs](/docs/pid/).

Data security affects all technical and organisational issues to protect the data from alteration, loss, and destruction. In this context, storage methods, backup procedures, necessary physical resources as well as automated and administrative routines must be planned and put in place. Ask local contacts or external experts about already established technologies for [Data Storage and Archiving](/docs/data_storage) as well as suitable [Repositories](/docs/repositories/).

Expand All @@ -70,11 +70,11 @@ Good data documentation does not happen over night - take small steps first. The
- Which devices or file formats are or have been used?
- Are there any special features?
- Awareness: Who produces (meta)data, and who continues to use data and how?
- Define internal rules and processes: What are the targets of RDM, and how can it be achieved?
- Apply and evaluate rules, iteratively: Learn, set, follow, repeat. Keep it simple and smart (KISS).
- Develop a suitable technology: Determine specific requirements in the first project phase and continuously adapt them to changing conditions.
- Establish supporting technology: Evaluate and test software like [ELN](/docs/eln/) and [Repositories](/docs/repositories/), train staff.
- Obtain legal advice, include local and higher-level policies and procedures: Contact legal department at your institution or [NFDI Querschnittssektion Ethik und Recht](https://www.nfdi.de/einrichtung-von-ersten-sektionen/)
- Define internal rules and processes: What are the targets of RDM, and how can they be achieved?
- Apply and evaluate rules iteratively: Learn, set, follow, repeat. Keep it simple and smart (KISS).
- Develop suitable technology: Determine specific requirements in the first project phase and adapt them continuously to changing conditions.
- Establish supporting technology: Evaluate and test software like [ELN](/docs/eln/) and [Repositories](/docs/repositories/); train your staff.
- Obtain legal advice, considering local and higher-level policies and procedures: Contact the legal department at your institution or [NFDI Querschnittssektion "Ethik und Recht"](https://www.nfdi.de/einrichtung-von-ersten-sektionen/)
- Make rules and decisions accessible to everyone at an early stage, for example in the form of a short handout.
- Check the concept regularly and update it if necessary.

Expand Down
Loading