Interview #35: Write a Selenium script that checks for broken links on a webpage.

To check for broken links on a webpage using Selenium, you can follow a systematic approach:

  1. Retrieve all anchor (<a>) tags with href attributes.
  2. Extract the URLs from these anchor tags.
  3. Send an HTTP request to each URL using a library like HttpURLConnection in Java or requests in Python.
  4. Check the HTTP response status codes:200: OK (valid link).4xx/5xx: Broken link.
  5. Log the results, indicating whether each link is valid or broken.

Disclaimer: For QA-Testing Jobs, WhatsApp us @ 91-6232667387

Complete Selenium Script in Java

Here is a Selenium script in Java that identifies and logs broken links:

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.List;
public class BrokenLinksChecker {
public static void main(String[] args) {
// Set up WebDriver and launch the browser
System.setProperty("webdriver.chrome.driver", "path/to/chromedriver");
WebDriver driver = new ChromeDriver();
try {
// Navigate to the webpage to check
driver.get("https://example.com");
// Maximize the browser window
driver.manage().window().maximize();
// Get all anchor tags on the page
List<WebElement> links = driver.findElements(By.tagName("a"));
System.out.println("Total links found: " + links.size());
// Iterate through each link and check its response
for (WebElement link : links) {
String url = link.getAttribute("href");
if (url != null && !url.isEmpty()) {
checkLink(url);
} else {
System.out.println("Invalid URL: " + url);
}
}
} finally {
// Close the browser
driver.quit();
}
}
// Method to check the link status
public static void checkLink(String url) {
try {
// Open a connection to the URL
HttpURLConnection connection = (HttpURLConnection) new URL(url).openConnection();
connection.setRequestMethod("HEAD"); // Use HEAD to reduce response time
connection.connect();
// Get the HTTP response code
int responseCode = connection.getResponseCode();
if (responseCode >= 400) {
System.out.println("Broken Link: " + url + " - Response Code: " + responseCode);
} else {
System.out.println("Valid Link: " + url + " - Response Code: " + responseCode);
}
} catch (IOException e) {
System.out.println("Error checking URL: " + url + " - Exception: " + e.getMessage());
}
}
}

Explanation of the Script

  1. Setup and Initialization: WebDriver is initialized with the ChromeDriver, and the target webpage URL is loaded.
  2. Fetching All Links:

  • driver.findElements(By.tagName("a")) retrieves all anchor tags on the page.
  • The href attribute is extracted from each link.

  1. Checking the HTTP Response:

  • For each URL, an HTTP connection is established using HttpURLConnection.
  • The HEAD request method is used to check the HTTP status without downloading the full response body.
  • Links with response codes >= 400 are identified as broken.

  1. Error Handling: The script handles exceptions to ensure that issues like malformed URLs or connectivity problems do not crash the execution.
  2. Logging Results: Valid and broken links are logged with their respective response codes.


Example Output

For a webpage with 5 links:

Total links found: 5
Valid Link: https://example.com/page1 - Response Code: 200
Broken Link: https://example.com/page2 - Response Code: 404
Valid Link: https://example.com/page3 - Response Code: 200
Broken Link: https://example.com/page4 - Response Code: 500
Invalid URL: null

Enhancements and Best Practices

  1. Headless Browsing: Use a headless browser mode to run the script without opening the GUI, making it faster.
  2. Parallel Execution: Use multi-threading or a parallel library (like ExecutorService in Java) to speed up link validation.
  3. Retry Mechanism: Implement retries for transient issues (e.g., network fluctuations).
  4. Exclusion List: Skip checking certain links like mailto links (mailto:xyz@example.com) or JavaScript actions (javascript:void(0)).
  5. Integration with Reporting Tools: Generate a detailed HTML or PDF report of valid and broken links using libraries like Apache POI or ExtentReports.


Why is This Useful?

  • User Experience: Ensures that all links on a webpage are functional, avoiding broken links that may frustrate users.
  • SEO Optimization: Search engines penalize websites with broken links, impacting their rankings.
  • Quality Assurance: Acts as part of regression testing to validate website content.
Previous: Interview #34: What's the role of Gherkin syntax in Cucumber? Write a basic example of a feature file.

Interview #35: Write a Selenium script that checks for broken links on a webpage.

To check for broken links on a webpage using Selenium, you can follow a systematic approach: Retrieve all anchor (<a>) tags with href ...

Most Popular