Update: I have received 10+ replies already. You may still apply, but it may take some time for me to reply.
The goal is to find predictors of future fertility from National Longitudinal Survey of Youth 1979 (NLSY79), with a webpage at https://www.nlsinfo.org/content/cohorts/nlsy79. As explained there, the NLSY79 Cohort is a longitudinal project that follows the lives of a sample of American youth born between 1957-64. The cohort originally included 12,686 respondents ages 14-22 when first interviewed in 1979; after two subsamples were dropped, 9,964 respondents remain in the eligible samples.
Data is available online at: https://www.nlsinfo.org/investigator/pages/search.jsp?s=NLSY79 (we will restrict ourselves to “NLSY79 variables commonly used in research”).
I will now describe the statistical work that I want done. Let me know if you think there is a better way to do it. I hope it can be done in SPSS, or perhaps Stata, since these are very prevalent software, allowing others to verify the results if needed.
• The variable NUMBER OF BIOLOGICAL CHILDREN REPORTED in the latest round (2012) for each individual should be compared to variables for the same individual back in the earliest round (or earliest round where the variable is available), in order to detect ones with statistically significant association.
• I think we should exclude variables that were introduced later than the first 5 years. Also, we should study females only, since they should be at least 48 years old in the 2012 round and have minimal probability of having more biological children.
• Also, it sounds reasonable to apply the same selection criteria as did the study published as “Fertility of women in the NLSY79” (http://www.bls.gov/opub/mlr/2016/article/fertility-of-women-in-the-nlsy79.htm): “The sample for this article is restricted to respondents who are female, have participated in an NLSY79 interview at age 46 (48 for us, since they used the 2010 round) or older, and have reported a valid year of birth for all biological children, a valid year for marital changes, and highest degree completed in round 9 (1988) (or a later round) of data collection. To classify respondents by educational attainment, we use their most recent report of highest degree completed... The data are weighted with the use of custom weights that make the sample representative of the population from which the NLSY79 was drawn.”
• Many qualitative characteristics may be converted to binary ones (1 if present, 0 if absent) such as: 1 if Hispanic, 0 if non-Hispanic.
• The next step is to select the variables that affect lifetime fertility by the greatest relative probability, perhaps between 8 and 15 of them. The selection will also involve some qualitative analysis where you may participate as well.
• The selected variables will be used for a regression function to predict the number of children achieved by 2012.
As a possible guide, an Indian study did something similar, but used concurrent factors instead of those in the past. It also applied additional measures, but which seem to me as just complicated - I would prefer just a straightforward function (but feel free to disagree): http://www.jds-online.com/file_download/394/JDS-1130.pdf
Throughout this process, it is very important that you keep a log of exactly what steps you took, preferably so that I can repeat them and come to the same results, despite having almost no experience in SPSS or Stata. This is important not only for the publication (where I can mention you as co-author if you want), but also by the reason that there will be a new round of data coming out in January 2017, and I want to be able to perform the same measures on this data.