Document Type : Research Paper

Authors

1 Department of Information Technology Management, Central Tehran Branch, Islamic Azad University, Tehran, Iran.

2 Department of industrial Management, Central Tehran Branch, Islamic Azad University, Tehran, Iran

Abstract

Today, the sustainability and resilience of web data extraction codes are one of the main challenges in software engineering and data mining, particularly when target websites use dynamic structures. selenium, as one of the most widely used libraries for web scraping, is utilized in many industrial and research projects; however, its high sensitivity to changes in web page elements often results in errors, system interruptions, and the need for frequent code modifications. This research presents a novel method that combines “dynamic selectors” with an AI-based decision-making layer to provide a functional, reliable and flexible approach for strengthening selenium-based code. In this study, data was collected from the Digikala website over a two-year period with weekly intervals. The proposed method analyzes the behavior of page elements, automatically replaces the appropriate selectors, and applies multiple intelligent fallback strategies in case of failure. This approach increases the stability and reliability of selenium execution against website changes and can serve as an efficient model for web data collection systems operating in dynamic environments. Experimental results showed that when the proposed method was used for automatic and intelligent selection of HTML elements, all data extraction operations were completed without errors and without requiring manual code modifications. However, when the method was not applied, 67 extraction attempts required direct selector corrections by the developer. This performance difference demonstrates that the presented model significantly reduces maintenance costs, development time, and the risk of extraction process failure.

Keywords

Main Subjects