Domain List Sources: (project name: DomainListSourcesImporter)
Given a set of files(eg. Directory, directories, file patterns, remote authenticated urls), extract from them lists of unique domain names, save these lists to databases and text files.
User should be able to add new set of files with ease repeatedly.
Implementation language: Java (bash/python can be used as glue language since deploy platform will be Unix(CENTOS).
A base blank skeleton java project in eclipse will be given with basic libraries to potentially help development. You may also choose to start fresh on your own.
Arguments should be specified as opts
• A file specifying input file sources:
Input sources can be one of the Set of directories, files, file patterns, http URL, FTP URL, etc. File format could be plain text or compressed files(eg. Zip, gz). A source file contains domain name as the first column on each line. Domain name could be of any case. Input could contain duplicate domain names.