How to filter out specific strings from a string

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP


How to filter out specific strings from a string



Python beginner here. I'm stumped on part of this code for a bot I'm writing.



I am making a reddit bot using Praw to comb through posts and removed a specific set of characters (steam CD keys).



I made a test post here: https://www.reddit.com/r/pythonforengineers/comments/91m4l0/testing_my_reddit_scraping_bot/



This should have all the formats of keys.



Currently, my bot is able to find the post using a regex expression. I have these variables:


steamKey15 = (r'wwwww.wwwww.wwwww')
steamKey25 = (r'wwwww.wwwww.wwwww.wwwww.wwwww.')
steamKey17 = (r'wwwwwwwwwwwwwwwsww')



I am finding the text using this:


subreddit = reddit.subreddit('pythonforengineers')
for submission in subreddit.new(limit=20):

if submission.id not in steamKeyPostID:
if re.search(steamKey15, submission.selftext, re.IGNORECASE):
searchLogic()
saveSteamKey()



So this is just to show that the things I should be using in a filter function is a combination of steamKey15/25/17, and submission.selftext.



So here is the part where I am confused. I cant find a function that works, or is doing what I want. My goal is to remove all the text from submission.selftext(the body of the post) BUT the keys, which will eventually be saved in a .txt file.



Any advice on a good way to go around this? I've looked into re.sub and .translate but I don't understand how the parts fit together.



I am using Python 3.7 if it helps.




2 Answers
2



can't you just get the regexp results?


m = re.search(steamKey15, submission.selftext, re.IGNORECASE)
if m:
print(m.group(0))



Also note that a dot . means any char in a regexp. If you want to match only dots, you should use .. You can probably write your regexp like this instead:


.


.


r'w{5}[-.]w{5}[-.]w{5}'



This will match the key when separated by . or by -.


.


-



Another hint is to use re.findall instead of re.search - some posts contain more than one steam key in the same post! findall will return all matches while search only returns the first one.


re.findall


re.search


findall


search



So a couple things first . means any character in regex. I think you know that, but just to be sure. Also wwwww can be replaced with w{5,5} where this specifies greedily anywhere between 5 and 5 words. I would use re.findall.


.


wwwww


w{5,5}


re.findall


import re
steamKey15 = (r'(?:w{5,5}.){2,2}w{5,5}')
steamKey25 = (r'(?:w{5,5}.){5,5}')
steamKey17 = (r'w{15,15}sww')
finds_15 = re.findall(steamKey15, txt)
finds_25 = re.findall(steamKey25, txt)
finds_17 = re.findall(steamKey17, txt)






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Stripe::AuthenticationError No API key provided. Set your API key using “Stripe.api_key = ”

CRM reporting Extension - SSRS instance is blank

Keycloak server returning user_not_found error when user is already imported with LDAP