I have an educational data set of how students performed on various learning tests.
One set of students was in a condition with one version of a learning game (physics).
The other set had a different version.
Three schools were run on this experiment.
I have attached the data set here.
Students were measured based upon their:
-Pre/post test scores/gain scores
-Engagement survey results
-Number of trials within the game itself
-Actions used within the game itself
-Trial times on incorrect trials within the game
-Trial times on a mini-game within the main game (differed across two conditions)
-A few other specific metrics
I would like the following completed in sci-kit learn using Python:
Exploratory statistics (scatterplots, histograms, etc.)
Training and Testing of dataset: GridsearchCV
Classifiers: Logistic Regression, Multinomial Naive Bayes, Decision Tree, Random Forest, K-Nearest Neighbor, Support Vector Classifier.
Model evaluation metric: accuracy,precision,recall,f1-score,mean-squared error.
Clustering (k-means, KNN) students performance on the final test based on:
-Pretest scores (high vs. low)
-Spatial ability (high vs. low)
-Attentional ability (high vs. low)
-Perhaps game performance metrics (actions used, trials, time spent)
This should not be more than a days worth of work, possibly less.
I realize some of these analyses may not make sense, we can discuss together to refine the strategy.
I would like the output and code for all these analyses.