What is YAML?
YAML is a data-serialization language that is human-friendly. YAML's main goals are simplicity and readability. You can think of YAML as JSON without the ugly parts. Because of its ease of use, YAML is being adopted for storing configuration files and other types of structured data that is meant to be edited by hand.
Basic Rules
First of all lets discuss some basic rules about working with YAML files:
- Whitespace and indentation matters and plays an important part in structuring the data, so take special care to stay consistent.
- Never use tabs outside of quoted strings, especially for indentation. The tab character is illegal within YAML files.
- Although the number of spaces doesn't matter, as long as the child node indentation is greater that its parent, it is a good practice to keep the same number of spaces.
- YAML is case sensitive.
- YAML's comments start with a
#
and go until the end of the line.
With that out of the way, let's see the basic YAML syntax.
Indentation
YAML uses indentation to indicate the structure and hierarchy of the data, in a manner reminiscent of Python's indentation rules.
The recommended indentation for YAML files is two spaces per level, but YAML can also follow any indentation system that the individual file uses. However, you should be consistent and avoid mixing different indentation styles in the same file.
Building Microservices:
author: Sam Newman
language: English
publication-year: 2021
pages: 586
Comments
YAML only supports single-line comments. All you have to do is to add a #
at the beginning of the line. You can also put it in the middle of the line. The text after #
till the end of the line is disregarded.
# This is a single line comment
foo: bar # this is an inline comment
Scalars
Scalars are a pretty basic concept. They are the most basic of all data types and are simple string, integer, float and boolean values.
Integer scalars are numeric values that represent whole numbers, such as 1
, 42
, -7
, etc. Integer scalars do not require quotes, and are typically treated as numeric types.
count: 3 # This is an integer scalar
Float scalars are numeric values that represent fractional or decimal numbers, such as 3.14
, -0.5
etc. Float scalars do not require quotes, and are typically treated as numeric types.
# This is a float scalar
pi: 3.14
# This is also a float scalar
area: 19.625
# And yet another scalar, in scientific notation
mass: 1.67e-27
Bool scalars are boolean values that represent true
or false
, such as true
, false
, yes
, no
, on
and off
. Bool scalars do not require quotes, and are typically treated as logical values.
# These are true/false boolean scalars
active: true
enabled: false
# These are yes/no boolean scalars
email-consent: yes
sms-consent: no
# These are on/off boolean scalars
switch: on
light: off
You should be aware that different versions of YAML may have different rules for interpreting bool scalars. For example, in YAML 1.1, yes and no are valid bool scalars, but YAML 1.2, they are not. For that reason, it is recommended to use true
and false
for bool scalars to avoid confusion and compatibility issues.
Strings can be written in different ways, depending on the syntax and style we are using. There are two main types of formats that YAML supports for strings.
Block String Scalars
Block scalars are a way of representing strings that can span multiple lines. They are defined by starting with a pipe character (|
) or right angle bracket (>
), followed by a space, and then the string content. The string content can be indented to any level, and the indentation will be preserved in the output. Block scalars are useful for representing strings that are long or complex. They can also be used to represent strings that contain newline characters.
Literal blocks are defined by a pipe character (|
), followed by a space, followed by the string content. The string content is not folded so any newline characters will be preserved in the input.
key1: |
This is a
literal string
with line breaks
and spaces.
If we convert this to JSON we will get the following:
{
key1: This is a\nliteral string\nwith line breaks\nand spaces.\n
}
Folded blocks are defined by a right angle bracket (>
), followed by a space, followed by the string content. The string content is folded, so any newline characters are replaced with spaces.
key1: >
This is a folded block.
It can span multiple lines, but newlines
will be replaced with spaces.
If we convert this to JSON we will get the following:
{
key1: This is a folded block. It can span multiple lines, but newlines will be replaced with spaces.\n
}
Flow String Scalars
Flow scalars are a way of representing strings that can span multiple lines. They use quotes and escape sequences to indicate the structure and boundaries of a string. They are more compact and readable than block string scalars, but they are more limited in escaping support. Flow scalars are defined by starting with a colon (:
), followed by the string content. The string content can be indented to any level, but the indentation will be ignored in the output.
Plain flow scalars are the simplest type. They do not use any quoting characters so any characters in the string content are interpreted literally.
key: This is a plain flow scalar.
Single-quoted flow scalars use single quotes ('
) to quote the string content. This prevents any special characters in the string from being interpreted literally.
key: 'This is a single-quoted string with ''single quotes''.'
Double-quoted flow scalars use double quotes ("
) to quote the string content. This allows for the use of special characters, such as newlines and backslashes.
key: This is a
flow string scalar
that becomes a
single line.
Sequences
Sequences are collections of items that are ordered and indexed. The items in a sequence can be of any type, including strings, numbers, mappings or other sequences. There are two main ways of writing sequences supported by YAML.
Block Sequences
Block sequence format uses dashes (-
) to indicate each item in the sequence. The items must be indented under the parent node, and they can span multiple lines. Block sequences are more readable and flexible than flow sequences, but they take up more space.
# A block sequence of strings
fruits:
- apple
- banana
- cherry
# A block sequence of dictionaries
users:
- name: Alice
age: 35
hobbies:
- reading
- writing
- horseback riding
- name: Bob
age: 33
hobbies:
- coding
- gaming
- miniature painting
Flow Sequences
Flow sequences use square brackets([]
) to enclose the items in the sequence. The items are separated by commas (,
) and they can be on the same line as the parent node. Flow sequences are more compact and concise than block sequences, but they have less escaping support.
# A flow sequence of strings
fruits: [apple, banana, cherry]
# A flow sequence of dictionaries
users: [{name: Alice, age: 35, hobbies: [reading, writing, horseback riding]}, {name: Bob, age: 33, hobbies: [coding, gaming, miniature painting]}]
Mappings
A YAML mapping is a collection of key-value pairs where each key is associated with a value. Mappings are similar to dictionaries, objects or associative arrays. In a mapping, each key must be unique but each key can have multiple values. The order of the keys is important.
map:
- key1: value1
- key2: value2
- key3: value3
Block Mappings
Block mappings use indentation and colons (:
) to represent key-value pairs, and are typically used without any explicit delimiters such as curly brackets. Block sequences are more human readable, but also more verbose.
Block mappings can contain nested block mappings by increasing the level of indentation for nested key-value pairs.
user:
username: cinnamon
name: John
surname: Doe
email: cinnamonroll@example.com
billing-address:
street: Some Street
number: 32
zip-code: 17288
Flow Mappings
Flow mappings use curly brackets ({}
) to enclose the data. The key and value are separated by a colon (:
) and each key-value pair within a mapping is separated by a comma (,
). Flow mappings are often used when brevity is important.
user:{username: cinnamon, name: John, surname: Doe, email: cinnamonroll@example.com, billing-address: {street: Some Street, number: 32, zip-code: 17288}}
Below is an example YAML file with all the elements we discussed this far.
# SCALAR TYPES
# Our root object will be a map
key: value
another-key: another string value that goes on and on
a-number-value: 100
a-number-in-scientific-notation: 1e+12
a-hex-value: 0x123 # this will evaluate to 291
an-octal-value: 0123 # this will evaluate to 83
# And now some boolean values
booleanTrue: true
booleanFalse: false
yesValue: yes # this evaluates to true
noValue: no # this will evaluate to false
# Strings don't need to be quoted but they can be
a-simple-string: Does not require quotes
single-quotes: 'have ''one'' escape pattern'
double-quotes: "have many \", \0, \t, \u223A, \r escape patterns"
# UTF charactes need to be encoded
utf-superscript: \u00B2
# Special characters must be enclosed in quotes
special-characters: "[ John ] & { Jane } - <Doe>"
# Multiple-line strings can be written as a literal block
literal_block: |
This entire block of text will have
its value preserved
with line breaks being preserved.
The literal continues until de-dented, and the leading indentation is
stripped.
# Or in a folded block
folded_style: >
This entire block of text will have its values preserved with line
breaks being converted into spaces.
Blank lines, like above, are converted to a newline character.
'More-indented' lines keep their newlines, too -
this text will appear over two lines.
# COLLECTION TYPES
# Nesting uses indentation. Better use 2 space indent
a_nested_map:
key: value
another_key: Another Value
another_nested_map:
hello: world
# Maps don't require string keys by the way
0.25: a float key
# Sequences look like this
a_sequence:
- Item 1
- Item 2
- 0.5
- Item 4
- key: value
another_key: another_value
- - This is a sequence
- inside another sequence
- - - Sequence-ception
# Since YAML is a superset of JSON we can use JSON-style maps and sequences
# also quotes are optional
json-map: {key: value}
json-sequence: [1, 2, 3, 4, 5]
Documents
Up to this point we worked only with a single YAML document. A single YAML file can have more than more than one document. Each document can be interpreted as a separate YAML file which means multiple documents can contain the same/duplicate keys.
A YAML file with multiple documents would look like below, where each new document is indicated by ---
.
---
# document 1
name: John Doe
age: 30
---
# document 2
pets:
- name: Spot
breed: Labrador Retriever
- name: Whiskers
breed: Siamese
---
# document 3
address:
street: 123 Main Street
city: Anytown
state: CA
zipcode: 12345
The three documents are separated by three dashes (---
). This tells the parser that each document is a separate unit of data.
Schemas and Tags
A schema in YAML is a definition of the structure of a YAML document. It specifies the allowed keys, types, and values for each key. A schema can be used to validate a YAML document, to ensure that it conforms to the expected structure.
There are three default schemas
- FailSafe Schema: It only understand maps, sequences and strings and it is guaranteed to work for any YAML file.
- JSON Schema: It understands all types supported within JSON including boolean, null, int and float, as well as the ones in the FailSafe schema.
-
Core Schema: It is an extension of the JSON schema, making it more human-readable supporting the same types but in multiple forms. So,
null
,Null
andNULL
will be resolved to the same typenull
.
It is also possible to create your own custom schemas based on the above default schema. For example, the following YAML schema defines a person:
type: object
properties:
name:
type: string
age:
type: integer
This schema specifies that a person is an object with two properties: a name
and an age
. The name
property must be a string, and the age
property must be an integer.
This leads us to the next question. What if I wanted to explicitly parse a value in a specific way?
This is where tags come into the picture. A YAML tag is a way to specify the type of data that is being represented. Tags are prefixed with two exclamation marks (!!
), followed by a URI. Even though we didn't explicitly mention the tags/types of any of the YAML snippets we've seen so far, they are inferred automatically by the YAML parser. For instance, mappings have the tag/type tag:yaml.org,2002:map
. Bellow is a snippet that works perfectly even when we specify tags.
name: John Doe
age: !!int 30
pets:
- name: Spot
breed: Labrador Retriever
species: !!tag:yaml.org,2002:animal dog
- name: Whiskers
breed: Siamese
species: !!tag:yaml.org,2002:animal cat
This file uses two tags:
-
!!int
: this tag specifies that the value of theage
key is an integer. -
!!tag:yaml.org,2002:animal
: this tag specifies that the value of thespecies
key is a YAML animal type. This is a custom tag that is defined by the YAML organization.
Anchors and Aliases
In YAML files, anchors (&
) and aliases (*
) are used to avoid duplication. When writing large configurations in YAML, it is common for specific configurations to be repeated. In the following example, the service configuration is repeated for all three services.
service-configuration:
service1:
tasks:
- name: Install Apache
apt:
name: apache2
state: present
- name: Start Apache
service:
name: apache2
state: started
enabled: true
service2:
tasks:
- name: Install Apache
apt:
name: apache2
state: present
- name: Start Apache
service:
name: apache2
state: started
enabled: true
service3:
tasks:
- name: Install Apache
apt:
name: apache2
state: present
- name: Start Apache
service:
name: apache2
state: started
enabled: true
As we add more and more settings for large configuration files, this will quickly become tedious. Also if you want to make a change in the configuration, you will have to find every entry in the config and change it.
Anchors and aliases allow us to rewrite the same snippet without repeating any configuration. Anchors (&
) are used to define a chunk of configuration, and aliases are used to refer to that chunk at a different part of the configuration
service-configuration:
service1:
tasks: &task-configuration
- name: Install Apache
apt:
name: apache2
state: present
- name: Start Apache
service:
name: apache2
state: started
enabled: true
service2:
tasks: *task-configuration
service3:
tasks: *task-configuration
Congratulations on finishing the article! You are now well on your road to being a YAML master. YAML is a popular markup language that is used practically anywhere when configuration must be written by hand. YAML's popularity may be seen in Kubernetes, Ansible, docker-compose, and GitHub Actions.
Hope you enjoyed reading!
Top comments (6)
Great article! I am a newbie and this has given me a clear understanding of YAML and it concepts. 👍🏾
Thanks for your comment! I'm glad you liked the post. I'm planning to make a complete series on GitHub actions, and this is the first post. If you're interested, stick around!
Thems fighting words 🤣 Nice article and introduction to YAML!
Very good post @kalkwst , I found a problem in your text.
Thanks for your feedback. I fixed the text! Thanks again for reading and taking the time to comment!
Very gentle indeed! Great article, thanks for that.