Get HTML-source as an HTML object with ability to work in it using DOM operations

Multi tool use
Get HTML-source as an HTML object with ability to work in it using DOM operations
I have a page, say, https://jq.profinance.ru/html/htmlquotes/site2.jsp
, which is updated every second. My aim is to parse
values using Selenium
.
https://jq.profinance.ru/html/htmlquotes/site2.jsp
parse
Selenium
driver = webdriver.Chrome()
driver.get(url)
mylist =
my_tables = driver.find_elements_by_tag_name('table') #operation1
for tr in my_tables.find_elements_by_tag_name('tr'): #operation2
mylist.append(tr)
The problem is that Python assigns a reference to object driver.find_elements_by_tag_name('table')
to my variable my_tables
but not value. Hence, I do not get correct data as there is some lag between operations 1 and 2.
driver.find_elements_by_tag_name('table')
my_tables
How can I copy the webpage HTML structure and then use Selenium
commands to walk through the structure of my document?
Selenium
I tried pickle
, get_aatribute("InnerHTML")
, .page_source
but they do not work properly as they copy the string object.
pickle
get_aatribute("InnerHTML")
.page_source
@Andersson, thanks but very inconvenient in my case. Is there a way to do it in Selenium webdriver above copied page?
– Alex
5 hours ago
You need to handle static HTML code source. Selenium is not much suitable for this purpose
– Andersson
5 hours ago
@Andersson, is there any lib that has DOM syntax identical to
Selenium
? I have much code to rewrite if I use BS4
or lxml
.– Alex
5 hours ago
Selenium
BS4
lxml
1 Answer
1
I don't think you can do exactly what you're trying to do with Selenium alone. Selenium "drives" a running web browser, and if the Javascript in that browser is updating the contents of the page every second or so you'll have these timing problems.
What you can do is use Selenium to drive the browser to get a snapshot of the page's HTML as a string (exactly as you describe in your last paragraph).
Then you can use a library like Beautiful Soup to parse the HTML string and extract the data that you need.
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
You can save HTML sample as a string and use either BeautifulSoup or lxml.html to parse it
– Andersson
6 hours ago