A Comprehensive Introduction to XML Path Language

The term XPath stands for XML Path Language. It is a query language employed for selecting various nodes in the XML document. Why is it called a language? Is this similar to commonly well-known programming languages such as C, C++, Java, Python, etc? We will discover this query language in this post right now.

XPath – An Overview

XPath is basically a language for navigation through XML documents and while discussing navigation, it means moving in an XML document in any direction, going to any element or any attribute and text node. XPath is a recommended language of the World Wide Web Consortium (W3C).

Why is XPath considered as a query language?

As SQL is used as the query language for different databases (for example, SQL can be used in databases like MySQL, Oracle, DB2, etc. ), XPath can also be used for various languages and tools (for example, languages like XSLT, XQuery, XLink, XPointer, etc. and tools like MarkLogic, Software Testing tools like Selenium, etc.)

In principle, compiler-based languages convert source files written in a specific language such as C++ or Java into binary codes to execute programs in machines. In the meanwhile, queries-based languages are used to write commands to retrieve and manipulate structured data. If SQL is used to query relational data, XPath is used for moving inside XML-based documents such as XML, HTML, etc. We have C/C++ compilers which are installed and run independently in operating systems. With XPath, we can write queried commands via either web browsers developer tools such as Chrome Inspect or APIs adopted XPath specifications (see XPath 3.0 as of the day I am writing this post) and designed to specific high-level programming languages for working with XML or HTML documents. For example, XPath with Selenium in Python or Java, XPath with lxml in Python.

Where Can We Use XPath?

XPath can be used in both the Software Development industry and Software Testing industry.

If you are in the Software Testing domain then you can use XPath for developing automation scripts in Selenium,  or if you are in the development domain then almost all of the programming languages have XPath support.

XSLT is predominantly used in the XML Content conversion domain and uses XPath for conversion. XSLT works closely with XPath and some other languages like XQuery and XPointer.

An example of XPath general syntax
An example of XPath general syntax

Types Of XPath Node

Enlisted below are the various types of XPath Node.

#1) Element Nodes: These are the nodes that come directly under the root node. An element node can contain attributes in it. It represents an XML tag. As given in the below example: Software Tester, State, Country are the element nodes.

#2) Attribute Nodes: This defines the property/attribute of the element node. It can be under the element node as well as the root node. Element nodes are the parent of these nodes. As given in the below example: “name” is the attribute node of the element node (software tester). The shortcut to denote attribute nodes is “@”.

#3) Text Nodes: All the texts that come in between element node is known as text node like in below example “Delhi”, “India”, “Chennai” is the text nodes.

#4) Comment Nodes: This is something that a tester or developer writes to explain the code which is not processed by the programming languages. Comments (some text) comes in between these opening and closing tags: <!– put comments here –>

#5) Namespaces: T\”;0j89////  /these are used to remove ambiguity between more than one set of the XML element names. For Example, in XSLT the default namespace is used as (XSL:).

#6) Processing Instructions: These contain instructions that could be used in the applications for processing. The presence of these processing instructions could be anywhere in the document. These come in between <? ….. ?>.

#7) Root Node: This defines the topmost element node which contains all the child elements inside it. Root Node does not have a parent node. In the below XML example the root node is “SoftwareTestersList”. To select the root node, we use forward slash i.e. ’/’.

We will write a basic XML program to explain the above-mentioned terms

<SoftwareTestersList>
<!-- Below is the list of Software Testers working in different States in India -->
    <softwareTester name="T1">
        <State>Delhi</State>
        <country>India</country>
    </softwareTester>
    <softwareTester name="T2">
        <State>chennai</State>
        <country>India</country>
    </softwareTester>
</SoftwareTestersList>

Atomic Values: All those nodes which do not have either child nodes or parent nodes, are known as Atomic Values.

Context Node: This is a particular node in the XML document on which expressions are evaluated. It could also be considered as the current node and abbreviated with a single period (.).

Context Size: This is the number of children of the parent of the Context Node. For Example, if the Context Node is one of the fifth children of its parent then the Context Size is five.

Absolute Xpath: This is the XPath expression in the XML document that starts with the root node or with ‘/’, For Example, /SoftwareTestersList/softwareTester/@name=” T1″

Relative XPath: If the XPath expression starts with the selected context node then that is considered as Relative XPath. For Example, if the software tester is the currently selected node then /@name=” T1” is considered as the Relative XPath.

Axes In XPath

  • Self-axis: Select the Context Node. The XPath expression self::* and . are equivalent. This is abbreviated by a single period(.)
  • Child axis: Select the children of the Context Node. Elements, comment, text nodes, and processing instruction are considered as a child of the Context Node. Namespace node and the attribute node are not considered as the child axis of the Content Node. For Example, child:: software tester.
  • Parent axis: Select the parent of the context node (if the context node is the root node, then the parent axis will result in an empty node.) This axis is abbreviated by a double period(. .). The expressions (parent:: State) and (../State) are equivalent. If the context node does not have <State> element as its parent then this XPath expression will result in an empty node.
  • Attribute axis: Select the attribute of the context node. This attribute axis is abbreviated by the at-sign(@). If the context node is not an element node then this will result in an empty node. The expression (attribute::name) and (@name) are equivalent.
  • Ancestor axis: Select the parent of the context node and it’s parent’s parent and so on. This axis contains the root node if the context node itself is not the root node.
  • Ancestor-or-self: Select the context node with its parent, its parent’s parent and so on and will always select the root node.
  • Descendant axis: Select all the children of the context node, their children’s children and so on. The children of the context node could be elements, comments, processing instructions, and text nodes. Namespace node and attribute node are not considered under the descendant axis.
  • Descendant-or-self: Select the context node and all the children of the context node and all the children of the children of all the context node and so on. As in the above case elements, comments, processing instructions, and text nodes are considered and namespaces & attribute nodes are not considered under the children of the context node.
  • Preceding axis: Select all the nodes that come before the context node in the whole document which is considered as the preceding axis. Namespace, ancestors and attribute node are not considered as the preceding axis.
  • Preceding-sibling axis: Select all preceding siblings of the context node. All nodes that appear before the context node and also have the same parent as of the context node in the XML document. The preceding-sibling will result in empty if the context node is a namespace or is an attribute.
  • Following axis: Select all nodes that come after the context node in the XML document. Namespace, attribute, and descendants are not considered in this following axis list.
  • Following-sibling axis: Select all the following siblings of the context node. All nodes that come after the context node and also have the same parent as the context node in the XML document are considered as a following-sibling axis. This will result in an empty node-set if the context node is namespace or attribute node.
  • Namespace: Select the namespace nodes of the context node. This will result in empty if the context node is not an element node.

Datatypes In XPath

Given below are the various Datatypes in XPath.

  • Number: Numbers in XPath represents a floating-point number, and are implemented as IEEE 754 floating-point numbers. Integer datatype does not consider in XPath.
  • Boolean: This represents either true or false.
  • String: This represents zero or more characters.
  • Node-set: This represents a set of zero or more nodes.

Wildcards In XPath

Enlisted below are the Wildcards in XPath.

  • An asterisk (*): This will select all the element nodes of the context node. It will select the text nodes, comments, processing instructions and attributes node.
  • At-sign with an asterisk (@*): This will select all the attribute nodes of the context node.
  • Node(): This will select all the nodes of the context node. These select namespaces, text, attributes, elements, comments and processing instructions.

XPath Operators

Note: In the below table, e stands for any XPath expression.

Operators Description Example
e1 + e2 Additions (if e1 and e2 are numbers) 5 + 2
e1 – e2 Subtraction (if e1 and e2 are numbers) 10 – 4
e1 * e2 Multiplication (if e1 and e2 are numbers) 3 * 4
e1 div e2 Division (if e1 and e2 are numbers and result will be in floating-point value) 4 div 2
e1 | e2 union of two nodes that match e1 and match e2. //State | //country
e1 = e2 Equals @name = ’T1’
e1 != e2 Not Equal @name != ’T1’
e1 < e2 Test of e1 is less than e2 (less-than sign ‘<’ must be excaped by ‘<’) test=”5 < 9” will result true().
e1 > e2 Test of e1 is greater than e2 (greater-than sign ‘>’ must be excaped by ‘>’) test=”5 > 9” will result false().
e1 <= e2 Test of e1 is less than or equal to e2. test=”5 <= 9” will result false().
e1 >= e2 Test of e1 is greater than or equal to e2. test=”5 >= 9” will result false().
e1 or e2 Evaluated if either e1 or e2 are true.
e1 and e2 Evaluated if both e1 and e2 are true.
e1 mod e2 Returns floating-point remainder of e1 divided by e2. 7 mod 2

Predicates In XPath

Predicates are used as filters that restrict the nodes selected by the XPath expression. Each predicate is converted to Boolean value either true or false, if it is true for the given XPath then that node will get selected, if it is false then the node will not be selected.

Predicates always come inside square brackets like [ ].

For Example, softwareTester[@name=”T2″]:

This will select the <softwareTester> element which has been named as an attribute with the value of T2.

Conclusion

In this tutorial, we have learned about XPath, How to use XPath expression, Support for XPath expression in different languages and tools. We learned that XPath can be used in any domain of Software Development and Software Testing.

We also learned the different Datatypes of XPath, different Axis used in XPath along with their usage, Node types used in XPath, Different Operators, and Predicates in XPath, the difference between Relative and Absolute XPath, Different Wildcards used in XPath etc.

References

[1] https://devhints.io/xpath

[2] https://developer.mozilla.org/en-US/docs/Web/XPath

[3] https://www.w3schools.com/xml/xpath_syntax.asp

[4] https://www.w3schools.com/xml/xml_xpath.asp

[5] XPath wikipedia, https://en.wikipedia.org/wiki/XPath, accessed on 13.09.2020

[6] Complete Guide For Using XPath in Selenium With Examples, https://www.lambdatest.com/blog/complete-guide-for-using-xpath-in-selenium-with-examples/, accessed on 13.09.2020

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.