I am late to this party, but in 2017 sociologists Mads Meier Jæger and Stine Møllegaard published a study using a monozygotic twin design to study the effects of cultural capital, a concept in education research capturing familiarity with the dominant culture.  Other sociologists [1,2]  have made convincing claims that cultural capital matters for academic success, but I am not so sure about this study.

The study’s design is clever.  The authors used administrative data to get a list of all twin births in Denmark from 1985 to 2000.  In 2013, they surveyed the mothers and asked them, for each of their children (both twins and up to two non-twin siblings), 12 questions about the kind of cultural capital they received when they were 12 years old.  They linked these cultural capital reports to the children’s academic success at the end of their compulsory schooling years (which is grade 9 in Denmark).  They estimate the “within-twin-pair” effects of cultural capital and find some astonishing findings–the standardized effect of cultural capital on an end-of-compulsory-schooling exam is .301, and a standard deviation increase in cultural capital is associated with a 12.5 percentage point increased chance of enrolling in upper-secondary schooling.  They also had some sizable but nonsignificant effects for GPA and Danish exams.

My suspicion, however, is that the paper says little about the effects of cultural capital on academic success and more about the effects of academic success on mothers’ recall of the cultural capital their twin children recieved.

To their credit, the authors are upfront about potential issues with their measurement of cultural capital.  They report intra-class-correlations (ICCs) for the mother’s cultural capital reports; for their omnibus cultural capital scale the ICC is .972.  This means that the correlation between twins’ cultural capital reports is .972.  In other words, there are precious few differences between twins’ cultural capital reports.  The large effect sizes the authors see are are driven by minute differences between twins.

I am just trying to imagine what would produce a situation where a parent, raising identical twins, says that one twin had more cultural capital than the other twin (e.g. they took one twin to a museum more times than the other twin, or one twin had many more books than the other twin, or they talked with one twin about social issues much more often than with the other twin).

I can think of three scenarios.

Scenario A: These mothers are filling out an online survey; we don’t know how many questions total are asked on the survey, but they have to answer at least 12 cultural capital questions for each twin (as well as up to two non-twin children).  I would guess the survey is kind of boring and is asking a lot of mothers to accurately estimate the extent of each of these forms of cultural capital for at least two kids.  Most parents are probably not going to think too hard about each question.  To the extent that they think about distinguishing the cultural capital each kid gets, they are probably going to fall back on easily retrievable information, like how the kid is doing presently, or how overall did the kid fare in school, and that is going to influence their reports of cultural capital.

Now, Jæger  and Møllegaard anticipated this objection, and argued that this “recall bias” should mean that parents are much less consistent about reporting cultural capital for differently-aged kids than for equally-aged kids like twins.  Fortunately they did ask parents about their non-twin children and they show that moms are roughly as consistent in reporting on cultural capital for differently-aged kids as for their twins.  I do not find this compelling and it seems to me they are taking the “recall” in “recall bias” too literally.  I believe the scenario I laid out above is going to be very similar for a mother reporting on her 25-year-old twin children as for a mother reporting on her 15-year-old twins.

Scenario B: An alternative scenario is that within-twin differences in cultural capital were caused by some kind of health mishap or trauma (e.g. if a kid gets disabled they will not be making many trips to museums; if a kid gets bullied at school they may not talk much even with their parents).  In that case, the effects of cultural capital in this study are not the effects of cultural capital but rather trauma.  In this case, the study is failing to account for important confounds.

Scenario C: Within-twin differences in cultural capital are random.  This could especially be the case if the survey is asking mothers to estimate each child’s cultural capital on different screens as opposed to a grid-like format (Jæger  and Møllegaard are not clear on this point).  The authors acknowledge this possibility as well and say “Random measurement error leads to attenuation bias, i.e. downwardly biased estimates of the effect of cultural capital on educational success…we get statistically significant estimates of the effect of individual cultural capital on educational success even in the presence of attenuation bias.”

This is falling prey to the “‘What does not kill my statistical significance makes it stronger’ fallacy”, coined by statistician Andrew Gelman the same year this study came out.  In statistics, “power” means the ability of a statistical test to detect a real effect. One thing that weakens power are small sample sizes.  Another thing is random measurement error.   The fallacy is the tempting notion that if you detect an effect using an underpowered design, that effect must be real.  Jæger  and Møllegaard are essentially saying that they have an underpowered design but they still found an effect–it must be REALLY real.

It is true that if you have an underpowered study, you will be less likely to detect an effect that exists in the population.  HOWEVER, if you have an underpowered study, and you still detect an effect, as Gelman shows, the chances your effect is of the wrong sign increase, and if your effect is of the right sign it will inevitably be overestimated.

My guess is that the Jæger  and Møllegaard estimates of the effect of cultural capital are the results of a mix of Scenarios A and C, especially if mothers were answering questions about each kid on a different screen.  If the questions were presented using a grid and making it easy for mothers to compare their answers across kids Scenario C seems less plausible to me.  For some reason, Scenario B seems unlikely to me–about as unlikely as cultural capital having the sizable causal effects that Jæger  and Møllegaard present.