Parsing HTML and XHTML in .NET
Bear™ says:
know of any good .NET xhtml parsers that are free?
Bear™ says:
something that will correct tags on the fly
Bear™ says:
like;
myString = HTMLParser.Parse ( mySource )
Tobelerone says:
SgmlReader - use it alot and is good.
http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=B90FDDCE-E60D-43F8-A5C4-C3BD760564BC
Html Agility Pack - is good too
http://sharptoolbox.com/tools/html-agility-pack
Devcomponents HtmlDocument - very good (I use it too) but costs $99
http://www.devcomponents.com/htmldoc/download.html
Chilkat HtmlToXml - haven't used but buying soon!
http://www.chilkatsoft.com/HtmlToXmlDotNet.asp
Bear™ says:
wow
Bear™ says:
thanks!
Using components that convert HTML to XHTML/XML is a great way to go if you need to mine information from web documents.
The best part of these converters is that they take badly written HTML (with broken tags etc) and fix it up as best they can, so you get a well formed XML document which you can work with.
This then lets you do lovely things such as:
//find all link tags on a web page
XmlNodeList linkNodes = xmlDoc.SelectNodes("//a[@href]");
//find all heading tags on a page
XmlNodeList headingNodes = xmlDoc.SelectNodes("//h1 or h2 or h3 or h4");
Good eh!
0 Comments:
Post a Comment
<< Home