Would you like to retrieve and parse the contents of a remote web page with ASP, maybe extract and index all the links? Maybe you're planning to build your own search engine, be the next big Google competitor . Well this function will show you how to build that with ASP.
You will need one or two items. To retrieve the pages, you'll be using MSXML 4.0. If you use an older version, you may run into an error with the responseText, where all special/foreign/accented characters are replaced with '?' questions marks. This is due to the encoding, and MSXML 4.0 solves that.
If you are behind a proxy server and you use ServerXMLHTTP code, you will get the error "Access Denied" or "The server name or address cannot be resolved". You need proxycfg. Run it from the command line like this "proxycf -u", and it will copy your proxy settings from IE.
So here's the function get the remote page
'=== grab a web page, return as string
function getPage(strURL)
dim strBody, objXML
set objXML = CreateObject("MSXML2.ServerXMLHTTP.4.0")
objXML.Open "GET", strURL, False
'objXML.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" '=== falsify the agent
'objXML.setRequestHeader "Content-Type", "text/html; Charset:ISO-8859-1"
'objXML.setRequestHeader "Content-Type", "text/html; Charset:UTF-8"
objXML.Send
strBody = objXML.responseText
set objXML = nothing
getPage = strBody
end function
Do what you want with the contents of the page. If you want to build it into a spider, extract all the links into an array with either a regular expression or by splitting at '
-
Search
- Latest entries
- Subjects
- ASP Code (30)
- ASP.Net Code (19)
- CSS Code (15)
- DHTML Code (20)
- Domains (9)
- Free code (11)
- JavaScript Code (39)
- Mac (25)
- Music (15)
- Online Games (61)
- PHP Code (94)
- Programming Tutorials (44)
- Random Musings (42)
- SEO Optimization (31)
- Spam (8)
- SQL Code (12)
- Tech Stuff (72)
- Web (111)
- Archives
- Related Links
©1973 No real copyright. Site by Justin Cook | RSS
Comprehensive resources on PHP based cheap web hosting at very reasonable price.
photo of Justin Cook
a2a_localize = {
Share: “Share”,
Save: “Save”,
Subscribe: “Subscribe”,
Email: “E-mail”,
Bookmark: “Bookmark”,
ShowAll: “Show all”,
ShowLess: “Show less”,
FindServices: “Find service(s)”,
FindAnyServiceToAddTo: “Instantly find any service to add to”,
PoweredBy: “Powered by”,
ShareViaEmail: “Share via e-mail”,
SubscribeViaEmail: “Subscribe via e-mail”,
BookmarkInYourBrowser: “Bookmark in your browser”,
BookmarkInstructions: “Press Ctrl+D or ⌘+D to bookmark this page”,
AddToYourFavorites: “Add to your favorites”,
SendFromWebOrProgram: “Send from any e-mail address or e-mail program”,
EmailProgram: “E-mail program”
};
a2a_linkname=”How to Write a Spider/Bot With ASP”;
a2a_linkurl=”http://www.justin-cook.com/wp/2006/04/14/how-to-write-a-spiderbot-with-asp/”;
a2a_linkname=”How to Write a Spider/Bot With ASP”;
a2a_linkurl=”http://www.justin-cook.com/wp/2006/04/14/how-to-write-a-spiderbot-with-asp/”;
a2a_init(“page”);
_uacct = “UA-67488-2″;
urchinTracker();
function adsense_click() {
if(window.status.indexOf(‘go to’) == 0) {
urchinTracker (‘/AdSenseClick’);
}
}
var elements;
if(document.getElementsByTagName) {
elements = document.body.getElementsByTagName(“IFRAME”);
} else if (document.body.all) {
elements = document.body.all.tags(“IFRAME”);
} else {
elements = Array();
}
for(var i = 0; i -1) {
elements[i].onfocus = adsense_click;
}
}
var infolink_pid = 12398;
var infolink_link_color = ’00c800′;
Sign up for our daily email newsletter:
You must log in to post a comment.