This is the first in a series of jobs -- i.e., this can lead to a lot of work. It's the fun part of data-science and analysis (in my opinion) where you are connecting disparate data sources and looking for interesting correlations/causations. Some of this *could* be done in Excel but we'd prefer Python or R. Ideally, you are familiar with many public data sets including US Census, NIH, IRS, WISQARS, Zillow, NAR, GSS --- and many more -- i.e., we won't be able to tell you where to go to get data all the time.
The work you do here, if satisfactory, will pave the way to further analysis This initial job consists of two tasks.
1. Are the rates at which taxes prepared correlated with the foreclosure rate in a given zip?
a) we will take property default rates at the zip level from Zillow (per 10,000 units)
b) we will take IRS reported data on tax preparations (it's buried among lots of data here):
- how does foreclosure rate correlate with tax preparation?
- has that changed over time?
- can you isolate the effects of this as an independent variable?
- produce charts, r value for the correlation and other insights
2. How does gun ownership affect foreclosure rates and home prices?
(note this is a broader question)
a) property data
Price per square foot data
b) gun ownership data (this will require creativity)
https://www.atf.gov/firearms/listing-federal-firearms-licensees-ffls-2013 (change last 4 digits in url to get from 2013 to 2015)
Use WISQARs, NCHS, GSS, Census etc.
Does gun ownership impact property prices? Are they correlated in any way?