How to extract values in quote marks from HTML string?

c# html html-agility-pack javascript web-scraping


I have the following snippet of code, retrieved from a web page:

<li class="player" data-id="168568" data-teamid="156" data-x="142.33" data-y="297.16040000000004" data-name="Corentin Tolisso" data-position="3">Corentin Tolisso<span class="shirt">24</span></li>

My goal is to extract "Corentin Tolisso", the shirt number "24" as well as the values of data-x and data-y.

So far I am able to get it to work with values that are within >...<, using HTML Agility Pack.

However I can't find a way to extract the numbers of data-x and data-y.

I have copied the HTML string into a new jsfiddle, which puts out exactly what my C# code is getting, the things between >...<.

How do I extract the values of data-x and data-y?

Note: Using String.IndexOf works fine, it takes away flexibility though. This is my fallback strategy.

Note 2: I looked here and here, both of which give me some idea, but I stil have a hard time applying it to C#.

7/2/2018 2:38:08 PM

Popular Answer

1 way would be using (["'])(?:(?=(\\?))\2.)*?\1 It supports nested quotes as well

Give it a try to this link:

With JQuery it makes it very simple.

Also check an example found here: Getting value of HTML text input

<form name="input" action="handle_email.php" method="post">
Email: <input type="text" name="email" />
<input type="submit" value="Newsletter" />
<a id="regLink" href="">Register</a>


Hope it helps you!

7/2/2018 2:52:38 PM

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow