To select HTML elements from your page, you can use XPath selectors, which are basically a set of expressions that will extract the nodes you require. The nodes are obtained by following a path in the HTML document, either downwards from a known node, or upwards (it searches for descendants or ancestors of a known element). To find elements using XPATH, find below what suits your search:
- Select a node by its name: nodename. Example: if the node is an h3 element, the selector is: h3.
- Select all the nodes that have a certain attribute: nodename[@attribute]. Example: Select all h3 elements that have a class: //h3[@class].
- Select all the nodes for which a specified attribute has a specified value. Example: Select all h3 elements that have a class having the ‘aClass’ value: //h3[@class=’aClass’].
- Select the nth child element of a node, having a certain element type: //node/child[n]. Example: Select the third ‘li’ child of a ‘ul’ element: //ul/li.
- Selecting the parent of a node: .. . Example: parent of a div element that has the class ‘aClass’: //div[@clas=’aClass’]/..
Let’s exemplify some of these selections on a piece of HTML code:
<div class="firstClass"> <li> <div class="secondClass andThirdClass" > <h3> <a href="aRandomLink”> randomText</a> </h3> <div class="fourthClass"> <div> <div class="fifthClass"> <a class="someSpecialClass" href="#" id="firstId"> <span>Some text here</span> </a> </div> <span>Yet more text</span> </div> </div> </div> </li> <li>...same structure as the first li...</li><li>...same structure as the previous two lis</li> </div>
- the h3 element: //h3
- the a element whose class attribute is someSpecialClass: //a[@class=’someSpecialClass’]
- the span element that has ‘Some text here’ as label: //div[@class=’fifthClass’]
- the span element that has ‘Yet more text’ as label: //div[@class=’fourthClass’]/div/span
- the first li element of the first div: //div[@class=’firstClass’]/li
- the second li element of the first div: //div[@class=’firstClass’]/li
- all div elements whose class attributes contain the string ‘Class’: //div[contains(@class, ‘Class’)]
- all div elements that have a class attribute containing the ‘second’ string and also a class attribute containing the ‘Third’ string: //div[contains(@class, ‘second’) and contains(@class, ‘Third’)