ZPath is a programmer-friendly syntax for searching structured objects. If you can navigate a filesystem and code an if-statement in C, Java or JavaScript, you already know 95% of the syntax.
The grammar is universal and can be applied to any tree structure: JSON, CBOR, XML etc
zpath | Description |
---|---|
/body | all nodes called body that are children of the root node |
. | the context node. Evaluation is always relative to a context |
* | all children of the context node |
** | the context node and all its descendents |
.. | the parent node |
..* | all ancestor nodes from the parent node up to the root node |
#2 | the third child of the context node (indices starts at zero) |
*/td | all grandchildren of the context node called td |
../tr#0 | the first node called tr of the context node's parent |
../tr[is-first()] | the same |
../tr[index() == 0] | the same |
../tr[index() == 0]/td | all td children of the same |
**/tr[count(td) == 2] | all nodes at or below the context called tr with two td children |
body[**/tr[count(td) == 2]] | all body children that match the above description |
car[age] | all car children with an age child of any value |
car[!age || type(age) == "null"] | all car children where age is missing or set to null |
car[!!age] | all car children with a child |
[key() != ix] | all children where the key its stored as in its parent != its ix value |
list/*[index() % 2 == 0] | every even-numbered child of the list child |
list/[index() % 2 == 0] | the same - this is the only situation where a * is implied |
car[is-first() ? "na" : age] | for all car chidren, "na" if it's first, or its age child otherwise |
count(tr/td) | if evaluated on a table, the number of cells in the table (a number) |
tr/count(td) | the number of cells in each row of the table (a list of numbers) |
table[@class == "defn"] | all table children with a class attribute of "defn" (XML only) |
[@* == "defn"] | all children where any attribute equals "defn" (XML only) |
table/@class | the class attribute of any table children, as a string (XML only) |
**/bowl/*, **/fruit | all descendents where parent is bowl or that are called fruit |
union(**/bowl/*, **/fruit) | the same, but with duplicate nodes merged |
Parsing
- an expression may be a function, a path, a string (in quotes), a number, or a combination of these using normal C-like operators
-
binary operators
+ - * / % && || ^ & | == != >= <= > <
require whitespace on either side -
the ternary operator
? :
requires white-space either side of?
and:
. Unary operators~
and!
do not. -
a path is constructed like a UNIX file path, a list of segments seperated with
/
-
a path segment can be
*
,**
,..
,..*
, a name, an index (#
integer), both (name#
integer), a function or a qualifying expression - each segment may followed by zero or more qualifying expressions (an expression inside square brackets)
- a function is a name immediately followed by
(
, zero or more arguments seperated by commas and optional whitespace, then a)
- arguments depend on the function, but are typically expressions
- the characters
\n \r \t ( ) [ ] / , = & | ! < > #
and space in a name must be backslash-escaped - finally, the top-level expression only may be a comma-separated list of expressions
Structural and Type Functions
Function | Description |
---|---|
count() | the number of nodes in the current node set |
count(expression) | the number of nodes matched by expression |
index() | the index of this node in the the current node set |
index(expression) | for each node matched by expression, its index into its parent |
key() | the key to retrieve this node from its parent (an integer for lists, typically a string for maps) |
key(expression) | for each node matched by expression, the key to retrieve it from its parent |
union(expression, ...) | the set union of each expression (no duplicates) |
intersection(expression, ...) | the set intersection of each expression |
is-first() | shorthand for index() == 0 |
is-last() | shorthand for index() == count() - 1 |
next() | if this node can be retrieved from its parent with an index, the node at the next index |
prev() | the same, but reutrn the node at the previous index |
string() | the string value of the current node (an empty set if no string value exists) |
string(expression) | for each node matched by expression, its string value |
number() | the number value of the current node (an empty set if no number value exists) |
number(expression) | for each node matched by expression, its number value |
value() | the primitive value of the current node |
value(expression) | for each node matched by expression, its primitive value |
type() | the type of this node as a string (the range of values depends on what's being evaluated) |
type(expression) | the type of each node mached by expression, or "undefined" if it is an empty set |
Math Functions
For the functions below, if expression is specified it will be evaluated and the result used as the input of the function. If expression is not specified, the current node-set will be used as the input. Any node in the input that are not number nodes will generate no output.
Function | Description |
---|---|
ceil(expression) | for each number node in the input set, its ceiling value |
floor(expression) | for each number node in the input set, its floor value |
round(expression) | for each number node in the input set, its rounded value |
sum(expression) | the sum of all the number-nodes in the input set |
min(expression) | the minimum-value of all the number-nodes in the input set |
max(expression) | the minimum-value of all the number-nodes in the input set |
max(expression) | the maximum-value of all the number-nodes in the input set |
String Functions
For the functions below, if expression is specified it will be evaluated and the result used as the input of the function. If expression is not specified, the current node-set will be used as the input.
Function | Description |
---|---|
escape(expression ...) | XML escape any string values in the input set |
unescape(expression ...) | XML unescape any string values in the input set |
format(format, expression) | use printf to format each node in the input set |
index-of(search, expression) | the first index in each input string of the specified search string, or -1 if not found |
last-index-of(search, expression) | the last index in each input string of the specified search string, or -1 if not found |
string-length(expression) | the length of each string in the input set |
upper-case(expression) | the upper-case version of each string in the input set |
lower-case(expression) | the lower-case version of each string in the input set |
substring(expression, start, length) | a substring of each string in the input set, of length characters starting at start |
match(pattern, expression) | true if string value of each input node matches the supplied regular expression, or false |
replace(pattern, replace, expression) | apply a regex search/replace to the string value of each node in the input set |
Functions for XML objects
Function | Description |
---|---|
url() | the namespace URL of any nodes in the current node set, or an empty string if the node has no namespace defined |
url(expression) | for each node matched by expression, its namespace URL |
local-name() | the local-name URL of any nodes in the current node set, or its name if the node has no namespace defined |
local-name(expression) | for each node matched by expression, its local-name |
Functions for CBOR objects
Function | Description |
---|---|
tag() | the tag for any nodes in the current node set, or an empty set if they have no tag |
tag(expression) | for any modes matched expression that have a tag, the tag for those nodes |
Details
White space is required around binary operators to remove confusion with paths: * * 2
is a valid expression (multiply
all children by 2) and representing this unambiguously requires whitespace. Requiring it everywhere is much simpler than trying to list where its required.
Some tree structures merge primitive types in the tree: for example in JavaScript, a JSON list ["string", "string"]
will
use the same string object for each array entry. Within these structures, the key
and index
functions are only
required to return the key or index of the first item that matches: for this example list, #0/index() == 0 && #1/index() == 0
.
The behaviour of the parent path segment ..
in this situation is undefined.
Some functions and paths depend on the type of object being traversed. For example, @name
is defined as the attribute value
"name" in XML. If applying zpath to other objects with a concept of attributes, the same syntax is recommended.
In other contexts, this syntax has no special meaning, it means a child with name "@name".
The value of the type()
function depends on the object being traversed; for JSON the types are string
, number
,
boolean
, list
, map
and null
. For CBOR it's the same plus buffer
. For XML it's
element
, text
and attr
, and other types such as processing-instruction
or comment
may be available
depending on the implementation. It's recommended that CDATA nodes are treated as text
and required that *
and **
do not match attributes.
Safety
ZPath and ZTemplates are designed to prevent a malicious template from causing runaway resource use. It should be safe to allow a ZTemplate from an untrusted party to execute without risk of infinite loops or out-of-memory situations.
Paths or functions that return nodes from the tree are required to merge duplicate nodes, so the number of nodes will never expand beyond
the number of nodes in the tree. None of the standard functions generate more than one result (string, number, boolean or node) for any input.
These two facts mean runaway expressions are impossible: at most the number of outputs from any zpath expression will be the number
of nodes in the tree. The one exception to this is comma-separator at the the top-levels, so /**,/**
will generate two copies
of every node in the tree. This can lead to large result-sets, but they will always be O(n)
.
There is no loop or recursion in the syntax, and ZTemplates are required to limit the number of includes.