Dyck language


In the theory of formal languages of computer science, mathematics, and linguistics, a Dyck word is a balanced string of square brackets . The set of Dyck words forms the Dyck language.
Dyck words and language are named after the mathematician Walther von Dyck. They have applications in the parsing of expressions that must have a correctly nested sequence of brackets, such as arithmetic or algebraic expressions.

Formal definition

Let be the alphabet consisting of the symbols . Let denote its Kleene closure.
The Dyck language is defined as:

Context-free grammar

It may be helpful to define the Dyck language via a context-free grammar in some situations.
The Dyck language is generated by the context-free grammar with a single non-terminal, and the production:
That is, S is either the empty string or is "", and an element of the Dyck language.
An alternative context-free grammar for the Dyck language is given by the production:
That is, S is zero or more occurrences of the combination of "", where multiple elements of the Dyck language on the right side of the production are free to differ from each other.

Alternative definition

In yet other contexts it may instead be helpful to define the Dyck language by splitting into equivalence classes, as follows.
For any element of length, we define partial functions and by
with the understanding that is undefined for and is undefined if. We define an equivalence relation on as follows: for elements we have if and only if there exists a sequence of zero or more applications of the and functions starting with and ending with. That the sequence of zero operations is allowed accounts for the reflexivity of. Symmetry follows from the observation that any finite sequence of applications of to a string can be undone with a finite sequence of applications of. Transitivity is clear from the definition.
The equivalence relation partitions the language into equivalence classes. If we take to denote the empty string, then the language corresponding to the equivalence class is called the Dyck language.

Properties

We can define an equivalence relation on the Dyck language. For we have if and only if, i.e. and have the same length. This relation partitions the Dyck language where. Note that is empty for odd.
Having introduced the Dyck words of length, we can introduce a relationship on them. For every we define a relation on ; for we have if and only if can be reached from by a series of proper swaps. A proper swap in a word swaps an occurrence of ']'.
For each the relation makes into a partially ordered set. The relation is reflexive because an empty sequence of proper swaps takes to. Transitivity follows because we can extend a sequence of proper swaps that takes to by concatenating it with a sequence of proper swaps that takes to forming a sequence that takes into. To see that is also antisymmetric we introduce an auxiliary function defined as a sum over all prefixes of :
The following table illustrates that is strictly monotonic with respect to proper swaps.
partial sums of
]
partial sums of
Difference of partial sums0200

Hence so when there is a proper swap that takes into. Now if we assume that both and, then there are non-empty sequences of proper swaps such is taken into and vice versa. But then which is nonsensical. Therefore, whenever both and are in, we have, hence is antisymmetric.
The partial ordered set is shown in the illustration accompanying the introduction if we interpret a as going down.

Generalizations

There exist variants of the Dyck language with multiple delimiters, e.g., on the alphabet "", "". The words of such a language are the ones which are well-parenthesized for all delimiters, i.e., one can read the word from left to right, push every opening delimiter on the stack, and whenever we reach a closing delimiter then we must be able to pop the matching opening delimiter from the top of the stack.