The project includes research regarding the countries through which Tor circuits pass.
This is a small sample job that is a part of that project. Python is not actually needed for this simple join, but is what we are using in the larger project.
The first aim is to capture all relays (ID'd by relay IP address) through which a client's (many) Tor circuit pass during a Tor browser session, and then figure out which countries the relays are in by joining the IP address with a GeoIPLite IP-ISO3166 COuntry code lookup table.
To start simply, I already have a file with tens of thousands of Tor circuit relays captured by IP address. I need to join it with a file I have from GeoIPLite that looks up IP addresses and returns them. I would also like to join the circuit IP address list with Tor's own relay descriptor file, which needs to be parsed a bit to find IP address for each node. I want to store the result in a MySQL file. It can be stored in a delimited file which i can import into my MYSQL db.
That is a simple place to start- a simple lookup and return. Can you estimate this? Maybe we can try this first, thanks.
The geoIPLite file is at: http://dev.maxmind.com/geoip/legacy/geolite/
The Tor path simulator to create a circuit output file can be found here:
or i can send you an output file, it is too large how ever to send through here.
The Tor descriptor file can be found by modifying your torrc file under Tor control panel, settings, advanced, edit torrc, to:
Log notice stdout
Log info file /Users/sarah_cortes/sbresearch.git/DataTorResearch/circuit-captures/AutoCircuitCaptures/raw/SarahLocalTorLog3.txt
Please let me know if you would like to estimate this sample job, thanks