We need to conduct cluster analysis to determine the relationship and clustering of survey responses that occur together or far apart. (Full dataset attached.)
The goal is to determine which responses occur together. This data will be used to better understand our customers and provide insights for the marketing team to use when creating marketing materials. For example, if the data shows that women over age 40 do X and Y, but never do Z, we need to know this.
1. We will need to exclude responses for users who have only answered demographic questions (incomplete submissions).
2. Some questions are open-ended in format, so we will have to conduct some sort of natural language processing to categorize them first. While we can conduct manual analysis on this, I would prefer to use a machine learning algorithmic method instead.
3. This should most certainly require a tool like R or python to perform.
4. Preference shall be given to candidates with the lowest bid.