Experimental and Data Analysis Data Analysis!
There were a surprisingly high number of no-shows, but overall I think teams did quite well and I was impressed. Special congratulations to Beckendorff A for winning by almost 50 points!
You can find all score distributions on the "Stats" document in the test folder:
https://drive.google.com/drive/folders/ ... _d-OQ-o8T1
Mean: 317.95
Median: 325.5
Std. Dev: 131.536
Max: 604
Min: 48
Q1: 225.5
Q3: 418.5
Max Possible Score: 690
Overall, I'm pretty satisfied with the distribution, which was surprisingly almost normal.
Common Errors:
I feel the need to detail sketches of solutions to 3 of the most difficult problems on the test.
First, the "base rate fallacy." Very few teams gave a full explanation of what's going on. The general idea is that if you compute the number of false positives, you have more false positives than true positives! The "base rate" accuracy of 95\% seems off considering that if you receive a positive test, you are more likely to be clean than not. Hence, the fallacy.
Second, the bird-pecking problem. In my opinion, this was by far the hardest problem on the test. I believe only one or two teams got within what I presumed to be a rounding error. The problem is based on two related problems that are more well-known:
1. At a nursery, 2006 babies sit in a circle. Suddenly each baby pokes the baby immediately to either its left or its right, with equal probability. What is the expected number of unpoked babies? (2006 HMMT Guts Round/3)
2. In a barn, 100 chicks sit peacefully in a circle. Suddenly, each chick randomly pecks the chick immediately to its left or right. What is the expected number of un-pecked chicks? (2017 MATHCOUNTS Nationals Countdown Round/Final Problem)
The key here is to use
linearity of expectation, which tells us that E[X]+E[Y] = E[X+Y] for any events X and Y. So, one can compute the expected un-pecked-ness of the birds on the corners, edges, etc. and add them.
Third, the final CORVID-19 problem. A team pointed out to me that vaccines don't really work the way I had portrayed them in the test, so the new final version should say "treatment." Anyway, onward to what the problem. I think only one or two teams got full points on this question, though there were many that got partial credit.
For full points, I hoped for teams to realize that the improvements in the recovery rates were very low, and in fact if one were to combine the data between the two species (recall CORVID-19 is supposed to be affecting all birds!), the trend is actually that the treatment lowers recovery rate, an example of Simpson's Paradox. I also awarded full points if (as I realized after writing the key initially) the team note that if you were to compute the margins of error on the given sample sizes, then the improvement is within said margin of error and hence the improvement is negligible. In hindsight, a statement saying that the treatment is meant for all birds might have been helpful.
Otherwise, the only issues I saw were a lack of clarity or detail on explanations and the prelab, which is understandable for a long-ish test.
Overall:
Thank you everyone for taking my test! I was pleasantly surprised both at the quality of responses teams gave as well as the distribution of the scores. When the mean and median are that similar, I'm pretty happy.
Good luck on the rest of your season everyone! - Klebb