We're looking for an experienced java developer with experience with apache nutch and writing hadoop jobs.
We want to redevelop an existing crawler in nutch in order to process more large-scale websites as we can currently handle.
Please answer the following questions in order to be considered:
1) What is a nutch segment?
2) Explain how you develop, test and launch hadoop jobs.
3) Explain the purpose of the linkdb in nutch.
4) What's your favourite fruit?