Get HTML-source as an HTML object with ability to work in it using DOM operations

The name of the picture


Get HTML-source as an HTML object with ability to work in it using DOM operations



I have a page, say, https://jq.profinance.ru/html/htmlquotes/site2.jsp, which is updated every second. My aim is to parse values using Selenium.


https://jq.profinance.ru/html/htmlquotes/site2.jsp


parse


Selenium


driver = webdriver.Chrome()
driver.get(url)
mylist =

my_tables = driver.find_elements_by_tag_name('table') #operation1
for tr in my_tables.find_elements_by_tag_name('tr'): #operation2
mylist.append(tr)



The problem is that Python assigns a reference to object driver.find_elements_by_tag_name('table') to my variable my_tables but not value. Hence, I do not get correct data as there is some lag between operations 1 and 2.


driver.find_elements_by_tag_name('table')


my_tables



How can I copy the webpage HTML structure and then use Selenium commands to walk through the structure of my document?


Selenium



I tried pickle, get_aatribute("InnerHTML"), .page_source but they do not work properly as they copy the string object.


pickle


get_aatribute("InnerHTML")


.page_source





You can save HTML sample as a string and use either BeautifulSoup or lxml.html to parse it
– Andersson
6 hours ago







@Andersson, thanks but very inconvenient in my case. Is there a way to do it in Selenium webdriver above copied page?
– Alex
5 hours ago





You need to handle static HTML code source. Selenium is not much suitable for this purpose
– Andersson
5 hours ago





@Andersson, is there any lib that has DOM syntax identical to Selenium? I have much code to rewrite if I use BS4 or lxml.
– Alex
5 hours ago


Selenium


BS4


lxml




1 Answer
1



I don't think you can do exactly what you're trying to do with Selenium alone. Selenium "drives" a running web browser, and if the Javascript in that browser is updating the contents of the page every second or so you'll have these timing problems.



What you can do is use Selenium to drive the browser to get a snapshot of the page's HTML as a string (exactly as you describe in your last paragraph).



Then you can use a library like Beautiful Soup to parse the HTML string and extract the data that you need.






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Keycloak server returning user_not_found error when user is already imported with LDAP

Using generate_series in ecto and passing a value

PHP parse/syntax errors; and how to solve them?