These must be absolute URLs (which might include those converted to absolute URLs by specifying a base). Return the link from the SCRIPT tag's SRC attribute $extor->schemes( SCHEME, ) Return the link from the BODY tag's BACKGROUND attribute. Return a list of the links from all the HREF attributes of any tag. Return a list of the links from all the HREF attributes of the BASE tags. Return a list of the links from all the HREF attributes of the AREA tags. Return a list of the links from all the HREF attributes of the A tags. Return a list of the links from all the SRC attributes of any tag. Returns the combined list from frame and iframe. Return a list of all the links from all the SRC attributes of the IFRAME. Return a list of all the links from all the SRC attributes of the FRAME. Return a list of the links from all the SRC attributes of the IMG. This way, you can use the same parser for another file. $extor->parse_url( $url )įetch URL and parse its content for links. Object methods $extor->parse_file( $filename ) Returns a list of the tags HTML::SimpleLinkExtor pays attention to. Returns a list of the attributes HTML::SimpleLinkExtor pays attention to. This affects the entire class, including previously created objects. ![]() Takes attributes out of the internal list that HTML::SimpleLinkExtor uses to extract URLs. HTML::SimpleLinkExtor->remove_attributes( ATTR ) Take tags out of the internal list that HTML::SimpleLinkExtor uses to extract URLs. HTML::SimpleLinkExtor->remove_tags( TAG ) can()Ī smarter can that can tell which attributes are also methods. Until then you can add that attribute to the internal list. If you run into another attribute that this module doesn't handle, please send it to me and I'll add it. HTML::SimpleLinkExtor keeps an internal list of HTML tag attributes (such as 'href' and 'src') that have URLs as values. HTML::SimpleLinkExtor->add_attributes( ATTR ) Until then you can add that tag to the internal list. If you run into another tag that this module doesn't handle, please send it to me and I'll add it. HTML::SimpleLinkExtor keeps an internal list of HTML tags (such as 'a' and 'img') that have URLs as values. Returns the internal user agent, an LWP::UserAgent object. The supplied base URL overrides any other base URL found in the HTML.Ĭreate the link extractor object and do not resolve relative links. $extor = HTML::SimpleLinkExtor->new('') $extor = HTML::SimpleLinkExtor->new($base)Ĭreate the link extractor object and resolve the relative URLs accoridng to the supplied base URL. Class Methods $extor = HTML::SimpleLinkExtor->new()Ĭreate the link extractor object. If you want to reset the link list between files, use the clear_links method. If you parse multiple files, the link list grows and contains the aggregate list of links for all of the files parsed. Invalid HTML or XHTML may cause problems. This module is simply a subclass around HTML::LinkExtor, so it can only parse what that module can handle. If a tag is found, all of the relative URLs will be resolved according to that reference. You can extract all the links or some of the links (based on the HTML tag name or attribute name). This is a simple HTML link extractor designed for the person who does not want to deal with the intricacies of HTML::Parser or the de-referencing needed to get links out of HTML::LinkExtor. #extract the body background = $extor->schemes( 'http' ) DESCRIPTION ![]() $extor->clear_links # reset the link list $extor->parse_file($other_file) # get more links My $extor = HTML::SimpleLinkExtor->new() HTML::SimpleLinkExtor - Extract links from HTML SYNOPSIS use HTML::SimpleLinkExtor
0 Comments
Leave a Reply. |