Web Scraping Python Beautiful Soup

Web Scraping Python Beautiful Soup

I'm trying to extract the titles from a URL but it doesn't have a class. The following code is taken from the page source.

<a href="/f/oDhilr3O">Unatama Don</a>

The title actually does have a class but you can see that I have use index 3 as the first 3 titles aren't what I want. However, I don't want to use hard coding. But in the website the title is also a link, hence, the link above.

title_name=soup.find_all('div',class_='food-description-title') title_list= for i in range (3,len(title_name)): title=title_name[i].text title_list.append(title)

Unatama Don is the title I'm trying to get

Please make it Minimal, Complete, and Verifiable example . Good idea to check How to Ask links as well
– Prateek
Jul 20 at 21:16

<a> is hyperlink not a div class , your code is fetching div elements not <a> elements which you are expecting
– Prateek
Jul 20 at 21:19

<a>

div

Possible duplicate of Python: BeautifulSoup extract text from anchor tag
– Prateek
Jul 20 at 21:25

Use selenium? stackoverflow.com/questions/33155454/… driver.find_element_by_xpath('//a[@href="/f/oDhilr3O"]');
– QHarr
Jul 20 at 23:21

2 Answers
2

Here's an example of searching for an anchor element with a specific URL in BS:

from bs4 import BeautifulSoup document = ''' <a href="https://www.google.com">google</a> <a href="/f/oDhilr3O">Unatama Don</a> <a href="test">Don</a> ''' soup = BeautifulSoup(document, "lxml") url = "/f/oDhilr3O" for x in soup.find_all("a", {"href" : url}): print(x.text)

Output:

Unatama Don

The requests and bs4 modules are very helpful for tasks like this. Have you tried something like below?

import requests from bs4 import BeautifulSoup url = ('PASTE/YOUR/URL/HERE') response = requests.get(url) page = response.text soup = BeautifulSoup(page, 'html.parser') links = soup.find_all('a', href=True) for each in links: print(each.text)

I think this has the desired outcome you are looking for. If you would like the hyperlinks as well. Add another loop and add "print(each.get('href'))" within the loop. Let us know how it goes.

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Xuykyuu