![Creative The name of the picture]()
![Creative The name of the picture](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIqNYtpxKm3e4dGGCwxT6NF5W12mhI3cvBahKZNfLCxFYelyKwp87hcG8aSykHwAq30dwvUmDzfnnRVwx23KqR9lXwsxU3l2NljadlI8gRW1IjFuanobTmppFAiGBWs5_lamzujR5dyiSJ/s1600/1.jpg)
Clash Royale CLAN TAG#URR8PPP
regex wont separate last string
I made a regex that should be able to separate specific order of numbers from a html file, but it just doesnt work in the last part. So this is how the html file prints out:
0430n
0500 20 40 53n
0606 19 32 45 58n
0711 22 33 44 55 n
...
2000 20 40n
2100 20 40n
2200 20 40n
2300 20 40n
0000n
n
and this is my regex:
timeRegex = re.compile(r'''((dd)(dd)
(n|(s
(dd)
s?
(dd)?
s?
(dd)?
s?
(dd)?
s?
(dd)?
)n)?
)''',re.VERBOSE|re.DOTALL)
when looking at the list it works fine for the most part, until the last element in the list where it picks up the 0000 so it looks like this '2300 20 40n0000nn'
Please help out.
0000
s?
n
n
re.DOTALL
2 Answers
2
When it gets to this part of the input:
2300 20 40n
0000n
It matches as follows:
(dd)(dd)
2300
s
(dd)
20
s?
(dd)?
40
s?
(dd)?
00
s?
(dd)?
00
s? (dd)?
n
I suspect you didn't realize that s
matches any kind of whitespace, including newlines. If you want to match a space literally in a verbose regexp, write a space preceded by backslash. So most of those s?
should be ?
.
s
s?
?
The reason is twofold:
s
s?
So what happens is one of your s?
s eats the newline after the line 2300 20 40
, and the next s?
matches the missing whitespace in the middle of 0000
. You don't see the problem happening in other places because you have one less s?(dd)?
to cover two full lines; add one more to the regex and you will see the lines
s?
2300 20 40
s?
0000
s?(dd)?
2000 20 40n
2100 20 40n
imploded too.
I am not sure how you would like to parse this file, but judging from your code line-by-line. If so, "explicit is better than implicit":
time_regex = re.compile(r'^(d{4})(sd{2})*$')
with open(...) as inf:
for line in inf:
m = time_regex.match(line)
# Use m.group(1) and m.group(2).split()
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Are you asking why
0000
is matched? Yours?
matches 1 or 0 whitespaces.– Wiktor Stribiżew
yesterday