How to extract values in quote marks from HTML string?

c# html html-agility-pack javascript web-scraping

Question

I have the following snippet of code, retrieved from a web page:

<li class="player" data-id="168568" data-teamid="156" data-x="142.33" data-y="297.16040000000004" data-name="Corentin Tolisso" data-position="3">Corentin Tolisso<span class="shirt">24</span></li>

My goal is to extract "Corentin Tolisso", the shirt number "24" as well as the values of data-x and data-y.

So far I am able to get it to work with values that are within >...<, using HTML Agility Pack.

However I can't find a way to extract the numbers of data-x and data-y.

I have copied the HTML string into a new jsfiddle, which puts out exactly what my C# code is getting, the things between >...<.

How do I extract the values of data-x and data-y?

Note: Using String.IndexOf works fine, it takes away flexibility though. This is my fallback strategy.

Note 2: I looked here and here, both of which give me some idea, but I stil have a hard time applying it to C#.

Popular Answer

1 way would be using (["'])(?:(?=(\\?))\2.)*?\1 It supports nested quotes as well

Give it a try to this link: https://regex101.com/r/cB0kB8/1

With JQuery it makes it very simple.

Also check an example found here: Getting value of HTML text input

<form name="input" action="handle_email.php" method="post">
Email: <input type="text" name="email" />
<input type="submit" value="Newsletter" />
</form> 
<a id="regLink" href="http://mywebsite.com/register?user_email=">Register</a>

$('input[name="email"]').change(function(){
alert($('#regLink').attr('href')+$('input[name="email"]').val());
});

Hope it helps you!



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why