weblib.html

weblib.html.decode_entities(html)[source]

Convert all HTML entities into their unicode representations.

This functions processes following entities:
  • &XXX;
  • &#XXX;

Example:

>>> print html.decode_entities('→ABC R©')
→ABC R©
weblib.html.escape(html)[source]

Returns the given HTML with ampersands, quotes and angle brackets encoded.

weblib.html.find_base_url(html)[source]

Find url of <base> tag.

weblib.html.find_refresh_url(html)[source]

Find value of redirect url from http-equiv refresh meta tag.