Web Scraping Python Beautiful Soup

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP


Web Scraping Python Beautiful Soup



I'm trying to extract the titles from a URL but it doesn't have a class. The following code is taken from the page source.


<a href="/f/oDhilr3O">Unatama Don</a>



The title actually does have a class but you can see that I have use index 3 as the first 3 titles aren't what I want. However, I don't want to use hard coding. But in the website the title is also a link, hence, the link above.


title_name=soup.find_all('div',class_='food-description-title')
title_list=

for i in range (3,len(title_name)):
title=title_name[i].text
title_list.append(title)



Unatama Don is the title I'm trying to get





Please make it Minimal, Complete, and Verifiable example . Good idea to check How to Ask links as well
– Prateek
Jul 20 at 21:16







<a> is hyperlink not a div class , your code is fetching div elements not <a> elements which you are expecting
– Prateek
Jul 20 at 21:19


<a>


div





Possible duplicate of Python: BeautifulSoup extract text from anchor tag
– Prateek
Jul 20 at 21:25





Use selenium? stackoverflow.com/questions/33155454/… driver.find_element_by_xpath('//a[@href="/f/oDhilr3O"]');
– QHarr
Jul 20 at 23:21






2 Answers
2



Here's an example of searching for an anchor element with a specific URL in BS:


from bs4 import BeautifulSoup

document = '''
<a href="https://www.google.com">google</a>
<a href="/f/oDhilr3O">Unatama Don</a>
<a href="test">Don</a>
'''

soup = BeautifulSoup(document, "lxml")
url = "/f/oDhilr3O"

for x in soup.find_all("a", {"href" : url}):
print(x.text)



Output:


Unatama Don



The requests and bs4 modules are very helpful for tasks like this. Have you tried something like below?


import requests
from bs4 import BeautifulSoup

url = ('PASTE/YOUR/URL/HERE')
response = requests.get(url)
page = response.text
soup = BeautifulSoup(page, 'html.parser')
links = soup.find_all('a', href=True)

for each in links:
print(each.text)



I think this has the desired outcome you are looking for. If you would like the hyperlinks as well. Add another loop and add "print(each.get('href'))" within the loop. Let us know how it goes.






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Stripe::AuthenticationError No API key provided. Set your API key using “Stripe.api_key = ”

CRM reporting Extension - SSRS instance is blank

Keycloak server returning user_not_found error when user is already imported with LDAP