Clash Royale CLAN TAG#URR8PPP
How to filter out specific strings from a string
Python beginner here. I'm stumped on part of this code for a bot I'm writing.
I am making a reddit bot using Praw to comb through posts and removed a specific set of characters (steam CD keys).
I made a test post here: https://www.reddit.com/r/pythonforengineers/comments/91m4l0/testing_my_reddit_scraping_bot/
This should have all the formats of keys.
Currently, my bot is able to find the post using a regex expression. I have these variables:
steamKey15 = (r'wwwww.wwwww.wwwww')
steamKey25 = (r'wwwww.wwwww.wwwww.wwwww.wwwww.')
steamKey17 = (r'wwwwwwwwwwwwwwwsww')
I am finding the text using this:
subreddit = reddit.subreddit('pythonforengineers')
for submission in subreddit.new(limit=20):
if submission.id not in steamKeyPostID:
if re.search(steamKey15, submission.selftext, re.IGNORECASE):
searchLogic()
saveSteamKey()
So this is just to show that the things I should be using in a filter function is a combination of steamKey15/25/17, and submission.selftext.
So here is the part where I am confused. I cant find a function that works, or is doing what I want. My goal is to remove all the text from submission.selftext(the body of the post) BUT the keys, which will eventually be saved in a .txt file.
Any advice on a good way to go around this? I've looked into re.sub and .translate but I don't understand how the parts fit together.
I am using Python 3.7 if it helps.
2 Answers
2
can't you just get the regexp results?
m = re.search(steamKey15, submission.selftext, re.IGNORECASE)
if m:
print(m.group(0))
Also note that a dot .
means any char in a regexp. If you want to match only dots, you should use .
. You can probably write your regexp like this instead:
.
.
r'w{5}[-.]w{5}[-.]w{5}'
This will match the key when separated by .
or by -
.
.
-
Another hint is to use re.findall
instead of re.search
- some posts contain more than one steam key in the same post! findall
will return all matches while search
only returns the first one.
re.findall
re.search
findall
search
So a couple things first .
means any character in regex. I think you know that, but just to be sure. Also wwwww
can be replaced with w{5,5}
where this specifies greedily anywhere between 5 and 5 words. I would use re.findall
.
.
wwwww
w{5,5}
re.findall
import re
steamKey15 = (r'(?:w{5,5}.){2,2}w{5,5}')
steamKey25 = (r'(?:w{5,5}.){5,5}')
steamKey17 = (r'w{15,15}sww')
finds_15 = re.findall(steamKey15, txt)
finds_25 = re.findall(steamKey25, txt)
finds_17 = re.findall(steamKey17, txt)
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.