The blog of Tobin

Tobins nerd blog on .NET, Software, Tech and Nice Shiny Gadgets.

Thursday, September 28, 2006

Jason Fried on Browsers

Jason Fried's comments at the MIT Emerging Techologies Conference make a fair point. The browser is a good platform as long as browser manufacturers don't keep making life hard for web app publishers.

From a user perspective it's desirable not to have to install separate applications for apps such as Flickr and Basecamp.

I still think things will get a lot better as browsers improve and creating richer GUIs gets easier for the developer.

One thing I was thinking is that it would be handy to have a single point of sign on when you open the browser. The browser then passes relevant username/passwords to any sites that require them. Having to remember usernames/logins for every site is still an obstacle that hinders the simplicity of browser based applications. If a single mechanism could address the problem in one sweep, it would make life easier for users.

Microsoft Passport and "remember me" cookies are a step forward, but I think things could be improved if there were a standard that the browser inherently supported.

Friday, September 15, 2006

Parsing HTML and XHTML in .NET

Bear™ says:
know of any good .NET xhtml parsers that are free?

Bear™ says:
something that will correct tags on the fly

Bear™ says:
like;
myString = HTMLParser.Parse ( mySource )

Tobelerone says:
SgmlReader - use it alot and is good.
http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=B90FDDCE-E60D-43F8-A5C4-C3BD760564BC

Html Agility Pack - is good too
http://sharptoolbox.com/tools/html-agility-pack

Devcomponents HtmlDocument - very good (I use it too) but costs $99
http://www.devcomponents.com/htmldoc/download.html

Chilkat HtmlToXml - haven't used but buying soon!
http://www.chilkatsoft.com/HtmlToXmlDotNet.asp

Bear™ says:
wow

Bear™ says:
thanks!

Using components that convert HTML to XHTML/XML is a great way to go if you need to mine information from web documents.

The best part of these converters is that they take badly written HTML (with broken tags etc) and fix it up as best they can, so you get a well formed XML document which you can work with.
This then lets you do lovely things such as:


//find all link tags on a web page
XmlNodeList linkNodes = xmlDoc.SelectNodes("//a[@href]");


//find all heading tags on a page
XmlNodeList headingNodes = xmlDoc.SelectNodes("//h1 or h2 or h3 or h4");


Good eh!