We need an experienced computer engineer, data scientist or statistical analyst to help construct a natural language model for a set of semantic text and corresponding psychometric data.
We have a data set of individuals who have completed a) psychological profiles and b) prompted essays, and we would like to build a natural language dictionary / semantic text engine which would be capable of showing the types of words used by those who are high or low on certain scores in the psychological profiles.
Without specific experience building a model such as this, we think we could base the process on that used to develop the engine of the World WellBeing Project (http://www.wwbp.org/). They describe their process here: https://prezi.com/um0ajnhkhq9j/wwbp-how-we-do-it/ and here: http://static1.squarespace.com/static/53d29678e4b04e06965e9423/t/55f3185ce4b0dccea22ba0d6/1441994844670/Johannes+Eichstaedt+-+Measuring+Psychology+through+Social+Media+-+v6+sharable.pdf
Complete natural language dictionary for each psychometric measure of interest
Complete report highlighting findings & word correlations
Please do not waste our time with your favourite statistics formula, or the name of your employer or degree - all we want to know initially, in order to select people for interview stage, is answers to the following questions:
- how you would go about building this model
- the minimum number of records you would need to build it
- how you would present the results
- any further recommendations