ZPath & ZTemplate

ZPath is a programmer-friendly syntax for searching structured objects. If you can navigate a filesystem and code an if-statement in C, Java or JavaScript, you already know 95% of the syntax.

The grammar is universal and can be applied to any tree structure: JSON, CBOR, XML etc

zpath	Description
/body	all nodes called body that are children of the root node
.	the context node. Evaluation is always relative to a context
*	all children of the context node
**	the context node and all its descendents
..	the parent node
..*	all ancestor nodes from the parent node up to the root node
#2	the third child of the context node (indices starts at zero)
*/td	all grandchildren of the context node called td
../tr#0	the first node called tr of the context node's parent
../tr[is-first()]	the same
../tr[index() == 0]	the same
../tr[index() == 0]/td	all td children of the same
**/tr[count(td) == 2]	all nodes at or below the context called tr with two td children
body[**/tr[count(td) == 2]]	all body children that match the above description
car[age]	all car children with an age child of any value
car[!age \|\| type(age) == "null"]	all car children where age is missing or set to null
car[!!age]	all car children with a child age that is not null or false
[key() != ix]	all children where the key its stored as in its parent != its ix value
list/*[index() % 2 == 0]	every even-numbered child of the list child
list/[index() % 2 == 0]	the same - this is the only situation where a `*` is implied
car[is-first() ? "na" : age]	for all car chidren, "na" if it's first, or its age child otherwise
count(tr/td)	if evaluated on a table, the number of cells in the table (a number)
tr/count(td)	the number of cells in each row of the table (a list of numbers)
table[@class == "defn"]	all table children with a class attribute of "defn" (XML only)
[@* == "defn"]	all children where any attribute equals "defn" (XML only)
table/@class	the class attribute of any table children, as a string (XML only)
*/bowl/, **/fruit	all descendents where parent is bowl or that are called fruit
union(*/bowl/, **/fruit)	the same, but with duplicate nodes merged

Parsing

an expression may be a function, a path, a string (in quotes), a number, or a combination of these using normal C-like operators
binary operators + - * / % && || ^ & | == != >= <= > < require whitespace on either side
the ternary operator ? : requires white-space either side of ? and :. Unary operators ~ and ! do not.
a path is constructed like a UNIX file path, a list of segments seperated with /
a path segment can be *, **, .., ..*, a name, an index (#integer), both (name#integer), a function or a qualifying expression
each segment may followed by zero or more qualifying expressions (an expression inside square brackets)
a function is a name immediately followed by (, zero or more arguments seperated by commas and optional whitespace, then a )
arguments depend on the function, but are typically expressions
the characters \n \r \t ( ) [ ] / , = & | ! < > # and space in a name must be backslash-escaped
finally, the top-level expression only may be a comma-separated list of expressions

Structural and Type Functions

Function	Description
count()	the number of nodes in the current node set
count(expression)	the number of nodes matched by expression
index()	the index of this node in the the current node set
index(expression)	for each node matched by expression, its index into its parent
key()	the key to retrieve this node from its parent (an integer for lists, typically a string for maps)
key(expression)	for each node matched by expression, the key to retrieve it from its parent
union(expression, ...)	the set union of each expression (no duplicates)
intersection(expression, ...)	the set intersection of each expression
is-first()	shorthand for index() == 0
is-last()	shorthand for index() == count() - 1
next()	if this node can be retrieved from its parent with an index, the node at the next index
prev()	the same, but reutrn the node at the previous index
string()	the string value of the current node (an empty set if no string value exists)
string(expression)	for each node matched by expression, its string value
number()	the number value of the current node (an empty set if no number value exists)
number(expression)	for each node matched by expression, its number value
value()	the primitive value of the current node
value(expression)	for each node matched by expression, its primitive value
type()	the type of this node as a string (the range of values depends on what's being evaluated)
type(expression)	the type of each node mached by expression, or "undefined" if it is an empty set

Math Functions

For the functions below, if expression is specified it will be evaluated and the result used as the input of the function. If expression is not specified, the current node-set will be used as the input. Any node in the input that are not number nodes will generate no output.

Function	Description
ceil(expression)	for each number node in the input set, its ceiling value
floor(expression)	for each number node in the input set, its floor value
round(expression)	for each number node in the input set, its rounded value
sum(expression)	the sum of all the number-nodes in the input set
min(expression)	the minimum-value of all the number-nodes in the input set
max(expression)	the minimum-value of all the number-nodes in the input set
max(expression)	the maximum-value of all the number-nodes in the input set

String Functions

Function	Description
escape(expression ...)	XML escape any string values in the input set
unescape(expression ...)	XML unescape any string values in the input set
format(format, expression)	use `printf` to format each node in the input set
index-of(search, expression)	the first index in each input string of the specified search string, or -1 if not found
last-index-of(search, expression)	the last index in each input string of the specified search string, or -1 if not found
string-length(expression)	the length of each string in the input set
upper-case(expression)	the upper-case version of each string in the input set
lower-case(expression)	the lower-case version of each string in the input set
substring(expression, start, length)	a substring of each string in the input set, of length characters starting at start
match(pattern, expression)	true if string value of each input node matches the supplied regular expression, or false
replace(pattern, replace, expression)	apply a regex search/replace to the string value of each node in the input set

Functions for XML objects

Function	Description
url()	the namespace URL of any nodes in the current node set, or an empty string if the node has no namespace defined
url(expression)	for each node matched by expression, its namespace URL
local-name()	the local-name URL of any nodes in the current node set, or its name if the node has no namespace defined
local-name(expression)	for each node matched by expression, its local-name

Functions for CBOR objects

Function	Description
tag()	the tag for any nodes in the current node set, or an empty set if they have no tag
tag(expression)	for any modes matched expression that have a tag, the tag for those nodes

Details

White space is required around binary operators to remove confusion with paths: * * 2 is a valid expression (multiply all children by 2) and representing this unambiguously requires whitespace. Requiring it everywhere is much simpler than trying to list where its required.

Some tree structures merge primitive types in the tree: for example in JavaScript, a JSON list ["string", "string"] will use the same string object for each array entry. Within these structures, the key and index functions are only required to return the key or index of the first item that matches: for this example list, #0/index() == 0 && #1/index() == 0. The behaviour of the parent path segment .. in this situation is undefined.

Some functions and paths depend on the type of object being traversed. For example, @name is defined as the attribute value "name" in XML. If applying zpath to other objects with a concept of attributes, the same syntax is recommended. In other contexts, this syntax has no special meaning, it means a child with name "@name".

The value of the type() function depends on the object being traversed; for JSON the types are string, number, boolean, list, map and null. For CBOR it's the same plus buffer. For XML it's element, text and attr, and other types such as processing-instruction or comment may be available depending on the implementation. It's recommended that CDATA nodes are treated as text and required that * and ** do not match attributes.

Safety

ZPath and ZTemplates are designed to prevent a malicious template from causing runaway resource use. It should be safe to allow a ZTemplate from an untrusted party to execute without risk of infinite loops or out-of-memory situations.

Paths or functions that return nodes from the tree are required to merge duplicate nodes, so the number of nodes will never expand beyond the number of nodes in the tree. None of the standard functions generate more than one result (string, number, boolean or node) for any input. These two facts mean runaway expressions are impossible: at most the number of outputs from any zpath expression will be the number of nodes in the tree. The one exception to this is comma-separator at the the top-levels, so /**,/** will generate two copies of every node in the tree. This can lead to large result-sets, but they will always be O(n). There is no loop or recursion in the syntax, and ZTemplates are required to limit the number of includes.

zpath

ztemplate

Parsing

Structural and Type Functions

Math Functions

String Functions

Functions for XML objects

Functions for CBOR objects

Details

Safety

zpath

ztemplate

{{ expression }}

{{# expression }} content {{/ expression }}

{{> templatepath }}

Parsing