Open Directory Project
home | Author page | editor login

Hospedagem de sites
Hospedagem de Sites

Clean HTML - Dmoz

The purpose of this tool is simply get a output of urls and titles from a html fragment or a url because DMOZ URL cleaning engine (used on Add a page of links to unreviewed) sometimes cannot recognize the URLs inside a webpage and, instead of return all urls, it return only one or two urls. Test with both tools (official Dmoz and this one) the URL http://grandeminas.globo.com/unainet/index_jornais.htm. There are some improvements to add on Clean HTML but its working.

Usage

Put the HTML Fragment or the URL. If you choice the ouput type URL and Titles you will get a html fragment that can be parsed by Dmoz official multilinks tool.

Fill the form bellow to get parsed urls

URL:
Put a URL to get links here.

HTML Fragment:
If you do not fillow URL field, put some HTML fragment with some <a href...> ... </a> blocks.

Type of output:
Urls and Titles (eg: <a href=http://mysite.com>My Site Title</a>) Default.
Only Urls (eg: http://mysite.com)

Author and Sourcecode

This tool was built with PHP in LPGL license. You can read the sourcecode of clearhtml with highlights and without highlights. You can also download here (click on save as)

The author is Roberto Berto (darkelder at dmoz) or at his homepage.

To do list

  • Work with relative URLS (<a href="home.html"> ...).
  • Add a meta and URL title improviment.
  • Auto submit to Dmoz multilink.cgi the results (list like multilink confirm page: description, url, mark sites to visit, ODP note).
  • Add a redirect and broken links checker, like Dmoz multilink.cgi
  • Only externals links (all sites have internal links that arent relevant) -- optional

Changelog

  • 2002-11-04 - Added url count and code cleaning.
  • 2002-11-04 - If title or url is null, do not print it.
  • 2002-11-04 - Changed preg_match to preg_match_all. One bug killed.
  • 2002-11-04 - First release 1.0

Valid XHTML 1.0!

Hospedagem de sites

This site is kindly hosted by TeHospedo, check it hospedagem de sites Linux e Windows.