I'm looking for someone to help with me with statistical analysis. I have problems like the following:
1. When to inspect?
I have 10k documents per month steaming to about 200 office staff for data entry in offices scattered around the world. I have trained staff at HQ doing inspections of the data entry performed by the office staff, detecting errors and updating fields in which they detected errors.
The HQ staff currently detect errors on ~15% of documents (between nearly none and ~6% of errors on particular fields on documents). Users show learning (we detect fewer errors from users who have entered data on more documents) that continues over their first 2000 or so documents (where I start running out of data).
Required: I need to decide when a document can skip secondary inspection. I need to decide when users (HQ or practice users) don't understand something and need training (their error rate seems high for the difficulty of data entry on that field). When I change the user interface sometimes produce a step change in error rates - I need to decide whether I helped or hurt, and I need future error prediction accuracy to recover quickly.
2. What works?
We have a number of businesses that sell stuff, and we often change how that's done and how we promote (promotions, press mentions (that I can work to get), changes in price, changes in product, changes in business websites, training for our sales people, etc.). I'd like to learn more than I am from the things we change, so that I can focus our efforts where they work best. There is a huge amount of noise in this data.
Proposals should include answers to the following two questions:
Question A: In my first example job above, across 200 users the average error rate in their first 10 documents was 12% (that is, of the set of 2000 documents made from the first 10 document entered by each of 200 users, 12% contained at least one error). Across so few documents from each user (only 10) there is only a small indication that the error rate on the 10th document is lower than the error rate on the first document (learning might be occurring, but isn't large across 10 documents). A new user has entered 9 documents without any errors. What is the probability that they will error on their next document?
Question B: What question should I ask (or you and other applicants) to work out who will be good at doing this work? What question will effectively separate those who understand how to answer questions like this from those who don't understand the relevant statistical tools?
If you're not using Bayesian tools, we're not going to agree. You shouldn't bother applying.
If you think the right approach to Question A ignores the data for other users and concentrates solely on the new user's 9 document history, we're not going to agree. You shouldn't bother applying.
Keywords: Bayes Theorem, hypothesis testing, confidence interval.
Skills: statistics, research