Slightly complicated web text to variable parsing
Slightly complicated web text to variable parsing
I'm taking text from a website and parsing it into variables. However, the string I get when I pull the text is a bit complicated. It looks like this on the web...
Invoice #: 1267
Date: 4/16/2018 10:44:00 AM
PO #:
Reference:
Countermen: A/A
The issue i'm having is that all of this is one string. The string also changes dynamically as some orders have text inputted where others don't. Such as some orders that has every field filled while other orders have almost no field filled.
Invoice #:
1267
<br>
Date:
4/16/2018 10:44:00 AM
<br>
PO #:
<br>
Reference:
<br>
Countermen:
A/A
This is what is displayed when I inspect the web element.
I want to parse out the information into individual strings and ints for a test and i'm having difficulties dealing with the whole 'dynamic' part of the string as some strings will be longer while some will be shorter.
Heres some images of the actual website if it helps:
2 Answers
2
Assumptions:
:
<br>
Given your sample data:
using System;
using System.Collections.Specialized;
public class Program
{
public static void Main()
{
var str = @"Invoice #:
1267
<br>
Date:
4/16/2018 10:44:00 AM
<br>
PO #:
<br>
Reference:
<br>
Countermen:
A/A";
//Array containing "raw string data"
var raw = str.Split(new{"<br>"}, StringSplitOptions.RemoveEmptyEntries);
//Just using a simple NVC, opt for something else based on your needs
var kvp = new NameValueCollection();
//Go through the raw array we created earlier and
// add the key/value pairs to our NameValueCollection, kvp
Array.ForEach(raw, s =>
{
//Because of date/time, we'll restrict colon to first occurrence
var data = s.Split(new {":"}, 2, StringSplitOptions.None);
kvp.Add(data[0].Trim(), data[1].Trim());
});
/*
* At this point, we have our "parsed" data in
* key/value pairs, kvp and can use it as needed
*
*/
// We can loop through the kvp and simply display
foreach(string k in kvp.Keys){
Console.WriteLine("{0} = {1}", k, kvp[k]);
}
// We can assign values to variables we create
var invNum = kvp["Invoice #"];
}
}
Output:
Invoice # = 1267
Date = 4/16/2018 10:44:00 AM
PO # =
Reference =
Countermen = A/A
Documentation for: NameValueCollection Class
Hth...
var invNum = 1267; var month = 4; var day = 2018;
@Xman - You can extend the above for your needs.
var invNum = kvp["Invoice #"];
– EdSF
18 hours ago
var invNum = kvp["Invoice #"];
Where exactly do I put that? I'm still new so I don't really know what any of these functions and methods are. I kinda need a bit of hand holding for this situation.
– Xman
3 hours ago
@Xman I've updated the answer and added comments to the functions, and added a reference to Microsoft docs for
NameValueCollection
(what I arbitrarily chose to use, it's not the only data structure you can use - use whatever fits your needs).– EdSF
38 mins ago
NameValueCollection
You can use simple regex. s*
matches any whitespace, and (.*?)
matches any content that is found between the whitespace. The $
at the end forces it to match all text after Countermen
which is important:
s*
(.*?)
$
Countermen
string sb = "Invoice #:" +
"1267" +
"<br>" +
"Date:" +
"4/16/2018 10:44:00 AM" +
"<br>" +
"PO #:" +
"<br>" +
"Reference:" +
"<br>" +
"Countermen:" +
"A/A";
var matches = Regex.Match(sb,
@"Invoice #:s*(.*?)s*<br>s*Date:s*(.*?)s*<br>s*PO #:s*(.*?)s*<br>s*Reference:s*(.*?)s*<br>s*Countermen:s*(.*?)s*$");
if (!matches.Success)
{
throw new Exception("Unable to parse");
}
var invoice = matches.Groups[1].Value;
var date = matches.Groups[2].Value;
Dotnetfiddle here: https://dotnetfiddle.net/VHF4uW
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
I was more thinking of having each individual component have their own variables such as
var invNum = 1267; var month = 4; var day = 2018;
and so on...– Xman
22 hours ago