Find Jobs
Hire Freelancers

Directory Spider and data collector

$30-100 USD

Cancelled
Posted almost 19 years ago

$30-100 USD

Paid on delivery
I need a very small script which does the following procedure: SPIDER-PART - go to <http://www.g**[login to view URL]> (G**gle Directory) - spider **all** pages from the directory - collect each listed URL and save it into a database for every read out URL also save the following data 1. Anchor Text 2. Category it is listed in 3. width of the green bar, which is listed beside the URL (width of the green image "[login to view URL]") Once again, the script should do this for *all* Directory-Entries. Since the script should do the whole spidering and collecting of data within one or max. two days - it has to be very, very fast. ADMIN-PART I'll need some password-protected html-sites where i can do the following: - start/stop the script - have a "live monitor" which shows me, which pages the script is crawling on at the moment and which pages have already been crwaled (can be done in java, flash or whatever dynamical language which lets you put out the live status) and eventually, if there are any errors. - a page where i can sort all data from the database after the following criterias: 1. pixel-width of the green bar 2. Category (alphabetically ordered) - a search-page, where I can search pages which fit into a certain category and/or which have a certain pixel-width of the green bar. After accepting and paying for your work I may use, edit and resell the script free of further charge. As far as I know C++ should be the fastest language to put this into practice. I'll need an absoltuely fast spider. All PHP spiders I know are not able to spider the Google Directory within a day. ## Deliverables 1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done. 2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables): a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment. b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request. 3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement). ## Platform It will run on my dedicated Linux-Server. Administration of the script must be possible through a web interface, though.
Project ID: 3784004

About the project

Remote project
Active 19 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs

About the client

Flag of GERMANY
Mainz, Germany
5.0
34
Member since Jun 17, 2008

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759) & Freelancer Online India Private Limited (CIN U93000HR2011FTC043854)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.