Skip to content

Duplicate entries in YAML mappings (dicts) gets implicitly overwritten #272

Description

@albestro

This is the example that caused me problems and it was not immediately clear to me why it was problematic.

my-uenv:
  ...
  views:
    default:
      link: run
      uenv:
        env_vars:
          prepend_path:
            - PATH: /user-environment/paraview/bin
          set:
            - PARAVIEW_PLUGINS_DIR: /user-environment/paraview-plugins
          prepend_path:
            - LD_LIBRARY_PATH: /user-environment/paraview/lib
            - LD_LIBRARY_PATH: /user-environment/paraview/lib64

Moreover, stackinator didn't complain about it, it just went on and actually it produced a uenv, but the uenv_vars were not fully set as expected. The problem is the duplicate prepend_path entry.

IMHO this should raise an error (or at least a warning) about this problem.

YAML Spec

PyYAML library, which stackinator uses for reading YAML files, is not fully compliant with the YAML spec, which states (starting from YAML 1.0)

A mapping is an unordered set of key/value node pairs, with the restriction that each of the keys is unique.

And, in partial defense of PyYAML, this section of the YAML spec adds

This restriction has non-trivial implications [...] Since YAML mappings require key uniqueness, representations must include a mechanism for testing the equality of nodes. This is non-trivial since YAML presentations allow various ways to write a given scalar.

The way PyYAML currently handles this problem is by ignoring duplicates and overwriting.

Solutions

As said, I think that YAML spec should be enforced. I don't think there are, and IMHO there shouldn't be, other cases where the duplicate entries are useful.

The solutions I see to enforce YAML spec at the moment are

  • customize PyYAML to raise an error for duplicate entries: see the one proposed here Duplicate keys are not handled properly yaml/pyyaml#165 (comment) which might just work for our use-case
  • explore other libraries that might be conformant with the YAML spec. I read about:
    • ruamel.yaml
    • ruyaml
    • not sure about differences, but they self-describes as "derivated from PyYAML" but where "many of the bugs filed against PyYAML, but that were never acted upon, have been fixed in".

I opened this issue to decide if/how we would like to proceed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions