I already have a php crawler which runs from my pc using WAMP. It places the information into a mysl database and largely does the job it is supposed to.
What I want to do is:
- Ensure that the crawler obeys the robots.txt if I want it to (it used to do this but somehow the code is broken so I have to comment out this line now)
- Makes sure it crawls every URL - it used to show me the URLs one by one in the CMD window but now looks quite messy for some reason
- Queue up sites so it can do more than one site at a time
- I want to be able to specify URLs from different sites to be crawled - but only those URLS, I might one 10 URLs crawled from site 1, 5 URls crawled from site 2
If you are a php master, please get in touch.