Wednesday, February 21, 2018

Confidence Intervals for the Effect Size

This started with my reaction to one of Ann K. Emery's posts, on How to Visualize Statistically Significant P-Values with Squares (makeover of the table summarizing the results of a survey).  

I like the way the overview table looks, nice and clean - and it definitely passes the Squint Test.

I asked Ann, why not show confidence intervals for the effect size? This would be more useful for understanding the results, and less misleading than the non/significant dichotomy. She replied with two challenges, which i happily took up : (1) to illustrate my idea, and (2) to translate the terms for a non-technical audience.

Here is the viz showing some (random) effect sizes and the corresponding confidence intervals; it is incomplete, missing a legend and other details necessary for correctly reading it.

I couldn't find a way to show the same information in tabular format, as in Ann's example. Overall, I have to admit that Ann's solution is more elegant than mine.

Here comes the more difficult part : explaining the concepts to a (unspecified) non-technical audience. Assuming a similar context as the one described in Ann's post, I chose to imagine addressing a non-technical but educated audience, with some degree of exposure to statistical concepts, and who are familiar and comfortable with the idea of sampling-related uncertainty - like the senior staff reading Ann's report.

At some point in the recent past, these people have been exposed to one of my presentations of the concepts of sampling and sampling-related uncertainty : because we do not investigate the whole population, but a subset of it, our image of what happens in the population is somewhat blurry. To illustrate the concepts, the presentation involves showing a random selection of the pixels from an image; the number of pixels increases until the audience can identify the original image. They will in fact see two different images, one pretty simple (a couple of crisp geometrical shapes, with distinct colors), the other more complex (a famous painting : more blurry shapes, colors fading into one another, more nuances, more transition area), to understand the importance of population variability.

As a result of this presentation, they know that sampling-based surveys are a valid method for drawing conclusion about a population, based only on limited data, and that the values observed are surrounded by uncertainty (the image blur). They also know that blur does not prevent us from drawing ballpark conclusions : even if we cannot see the shape borders in the pixel samples, we have a pretty good idea where they are in the original image (we can put some limits to where the border can reasonably be, which give us the confidence interval).

By now, it would be sufficient to introduce (more likely to remind) the concept of effect size, i.e.,  the change in the indicator of interest generated by the  'treatment' (T). Since this is a sample-based measurement, it is uncertain but, because we have selected the sample using the proper methodology, we can determine the confidence interval.

So I should get no blank stares when I mention that the chart depicts the confidence interval of the effect size : given the impact of (T) observed in our sample, we can say, with 95% confidence, that, were the entire population to be subjected to (T), the observed change will be anywhere between [lo] and [HI].

I would also clarify that a negative lower limit (assuming the expected effect is positive) does not mean that applying (T) will actually decrease the original value. it simply means that, in the population, (T) is likely to have zero impact. I revised the chart to make it less confusing, by eliminating meaningless values of the effect (negative ones in this hypothetical case).

A confidence interval that includes zero is said to indicate non-significant impact; I would remind them that, in this context, we are talking about statistical significance, and that a statistically significant value is not necessarily nontrivial - the audience has to determine if a statistically significant value is also substantive and relevant from their perspective.

I get approving nods; they get it. Phew!

Update : Thanks to Daniel Lakens (who did the work) and Dana Linnell Wanzer (who pointed the way), I can now calculate effect sizes in Excel. 😇

Suggested readings for the audience

No comments: