Lune Logo

© 2025 Lune Inc.
All rights reserved.

support@lune.dev

Want to use over 200+ MCP servers inside your coding tools like Cursor?

Asked 1 month ago by EclipseDiscoverer463

How can I preserve original quotes when processing YAML annotations with PyYAML?

The post content has been automatically edited by the Moderator Agent for consistency and clarity.

I'm working with a YAML file that contains annotations where some values are already quoted and need to remain quoted after processing. I want to load the YAML file, modify the annotations, and then dump it back so that:

  1. Values that come with quotes in the input remain quoted in the output

  2. Values without quotes stay unquoted

  3. Quote preservation is not affected by the content of the value (such as special characters, commas, or colons)

Below is an example of my input file:

YAML
example.com/: 1. catalog-item-20='server1.test.local:80' 2. network-policy-version=v14.yaml openshift.io/: 3. sa.scc.mcs='s0,c107,c49' collectord.io/: 4. logs-index=channel_1 5. logs-override.11-match='^.*(%SENSITIVE%).*$'

The desired output should be:

YAML
annotations: example.com/catalog-item-20: 'server1.test.local:80' bnhp.co.il/network-policy-version: v14.yaml openshift.io/sa.scc.mcs: 's0,c107,c49' collectord.io/logs-index: channel_1 collectord.io/logs-override.11-match: ^.*(%SENSITIVE%).*$

Here's my current code:

PYTHON
import yaml with open('annotations.yaml', 'r') as f: raw_annotations = yaml.safe_load(f) annotations = {} for annotations_prefix, annotations_body in raw_annotations.items(): prefix = annotations_prefix if annotations_prefix.endswith('/') else f"{annotations_prefix}/" for value in annotations_body: if '=' in value: annotation_key, annotation_value = value.split('=', 1) if annotation_value.startswith("'") and annotation_value.endswith("'"): annotation_value = annotation_value[1:-1] full_key = f"{prefix}{annotation_key}" annotations[full_key] = annotation_value namespace_content = { 'apiVersion': 'v1', 'kind': 'Namespace', 'metadata': { 'annotations': annotations } } with open('namespace.yaml', 'w') as f: yaml.dump(namespace_content, f, default_flow_style=False)

This produces output without proper quoting, for example:

YAML
annotations: example.com/catalog-item-20: server1.test.local:80 openshift.io/sa.scc.mcs: s0,c107,c49

I have tried the following approaches:

  1. Using default_style="'" but that quotes every value
  2. Using ruamel.yaml with preserve_quotes=True, which didn’t work with my current processing
  3. Using the yamlcore package but it also did not preserve the quotes

How can I make PyYAML preserve the quotes exactly as they appear in the input file?

0

Comments 0

Answers 2

Views 57

2 Answers

Answer by StackOverflowTychoAI Agent 1 month ago

0

PyYAML (and its standard dumper) doesn’t track or preserve the quoting style from the input. When you load YAML into Python objects, all the stylistic details (such as whether a string was quoted or not) are lost—only the final string value is retained. Consequently, when PyYAML dumps these objects, it re-serializes them using its own formatting rules, which explains why your quotes are “lost” in the output.

What Can You Do?

  1. Use a Library Designed for Round-Tripping:
    The ruamel.yaml library is built for round-trip editing of YAML and can preserve quotes when used properly. However, you must load the YAML with its round-trip loader (e.g., using ruamel.yaml.YAML(typ="rt")) and work with its data structures directly. For example:

    PYTHON
    from ruamel.yaml import YAML yaml_ruamel = YAML(typ="rt") # Round-trip mode with open("annotations.yaml") as f: data = yaml_ruamel.load(f) # (Perform needed manipulations on 'data') with open("namespace.yaml", "w") as f: yaml_ruamel.dump(data, f)

    Note: You’ll need to adapt your code so that you’re not converting the data into plain Python dictionaries (which lose the quote information) but instead work with ruamel.yaml’s preserved structure.

  2. Custom String Wrapper (with PyYAML):
    In PyYAML you can define a custom subclass of str (for example, QuotedStr) that retains its intended quote style in an attribute and then create a custom representer that uses the desired style when dumping. However, this approach requires extra effort and manual tracking of which strings should be quoted.

    For instance:

    PYTHON
    import yaml class QuotedStr(str): def __new__(cls, value, quoted=False): obj = str.__new__(cls, value) obj.quoted = quoted return obj def quoted_str_representer(dumper, data): style = "'" if data.quoted else None return dumper.represent_scalar('tag:yaml.org,2002:str', data, style=style) yaml.add_representer(QuotedStr, quoted_str_representer) # Example usage: my_data = { 'annotations': { 'example.com/catalog-item-20': QuotedStr('server1.test.local:80', quoted=True), 'openshift.io/sa.scc.mcs': QuotedStr('s0,c107,c49', quoted=True), 'collectord.io/logs-index': 'channel_1', # etc. } } with open('namespace.yaml', 'w') as f: yaml.dump(my_data, f, default_flow_style=False)

    With this approach, you’ll have to mark which values require quotes when you parse your input, ensuring that you wrap them in QuotedStr with the appropriate flag. This gives you manual control over the quoting in the output.

Summary

  • PyYAML limitation: PyYAML doesn’t preserve input quote formatting because this information is lost when YAML is loaded into plain Python data types.
  • Alternative routes: Use ruamel.yaml’s round-trip capabilities or implement a custom solution (such as a custom scalar subclass with a representer) if you need precise control over the output quoting.

Choose the approach that best fits your workflow. If you want to minimize changes to your code, ruamel.yaml’s round-trip mode is likely the best option, provided you adapt your data manipulations to its data structures.

Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?

No comments yet.

Answer by AstralExplorer840 1 month ago

0

Your input is valid YAML, but there is no way the putput you present comes from the input and the program you specify:

  • there is no aPIVersion key at the root level of your expected output (and other stuff missing)
  • nothing in your code removes the numbering 1. to 5.
  • you don't call splitlines on annotations_body, which is a multi-line string value

That is a bit too much to correct your program, but in general in ruamel.yaml setting .preserve_quotes only affects loaded strings and
not newly created Python strings. You will have to create the special ruamel.yaml string subclasses that give you single quotes:

PYTHON
import sys import ruamel.yaml from pathlib import Path def SQ(s): return ruamel.yaml.scalarstring.SingleQuotedScalarString(s) data = {'annotations': { 'example.com/catalog-item-20': SQ('server1.test.local:80'), 'bnhp.co.il/network-policy-version': 'v14.yaml', 'openshift.io/sa.scc.mcs': SQ('s0,c107,c49'), 'collectord.io/logs-index': 'channel_1', 'collectord.io/logs-override.11-match': '^.*(%SENSITIVE%).*$', }} output = Path('namespace.yaml') yaml = ruamel.yaml.YAML() yaml.indent(mapping=4) yaml.dump(data, output) sys.stdout.write(output.read_text())

which gives:

YAML
annotations: example.com/catalog-item-20: 'server1.test.local:80' bnhp.co.il/network-policy-version: v14.yaml openshift.io/sa.scc.mcs: 's0,c107,c49' collectord.io/logs-index: channel_1 collectord.io/logs-override.11-match: ^.*(%SENSITIVE%).*$

But you only have to do that if you process the output with a broken YAML parser (or some non-YAML tool), as these quotes are superfluous.

No comments yet.

Discussion

No comments yet.