Get HTML-source as an HTML object with ability to work in it using DOM operations

Get HTML-source as an HTML object with ability to work in it using DOM operations

I have a page, say, https://jq.profinance.ru/html/htmlquotes/site2.jsp, which is updated every second. My aim is to parse values using Selenium.

https://jq.profinance.ru/html/htmlquotes/site2.jsp

parse

Selenium

driver = webdriver.Chrome() driver.get(url) mylist = my_tables = driver.find_elements_by_tag_name('table') #operation1 for tr in my_tables.find_elements_by_tag_name('tr'): #operation2 mylist.append(tr)

The problem is that Python assigns a reference to object driver.find_elements_by_tag_name('table') to my variable my_tables but not value. Hence, I do not get correct data as there is some lag between operations 1 and 2.

driver.find_elements_by_tag_name('table')

my_tables

How can I copy the webpage HTML structure and then use Selenium commands to walk through the structure of my document?

Selenium

I tried pickle, get_aatribute("InnerHTML"), .page_source but they do not work properly as they copy the string object.

pickle

get_aatribute("InnerHTML")

.page_source

You can save HTML sample as a string and use either BeautifulSoup or lxml.html to parse it
– Andersson
6 hours ago

@Andersson, thanks but very inconvenient in my case. Is there a way to do it in Selenium webdriver above copied page?
– Alex
5 hours ago

You need to handle static HTML code source. Selenium is not much suitable for this purpose
– Andersson
5 hours ago

@Andersson, is there any lib that has DOM syntax identical to Selenium? I have much code to rewrite if I use BS4 or lxml.
– Alex
5 hours ago

Selenium

BS4

lxml

1 Answer
1

I don't think you can do exactly what you're trying to do with Selenium alone. Selenium "drives" a running web browser, and if the Javascript in that browser is updating the contents of the page every second or so you'll have these timing problems.

What you can do is use Selenium to drive the browser to get a snapshot of the page's HTML as a string (exactly as you describe in your last paragraph).

Then you can use a library like Beautiful Soup to parse the HTML string and extract the data that you need.

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Xuykyuu