Sperm Stats: What’s in a Number?

Author: Alex Borsa

One of the major lessons of science is to never take a number at face value. Although their seemingly self-evident nature is partly what makes numbers so powerful, it is important to remember that a lot goes into making a number. Anytime a number is reported, there have already been countless decisions about what to measure, how to measure it, what statistical techniques to apply, and what to include—or exclude—to the audience. One of the hallmarks of good scientific practice is to make well-informed and hypothesis-driven decisions every step of the way, and to be transparent about the process so that interlocutors can meaningfully evaluate the work that’s been done. But without advanced training, and often even with advanced training, it can be hard to differentiate robust and informative findings from statistical sleight of hand.

In this blog post, I’m going to walk through how sperm statistics are made by analyzing the 2017 paper by Levine et al., wherein the authors argue that sperm counts around the globe are dropping—but most particularly in “Western” countries and not “Other” countries. I highlight some key methodological choices that Levine et al. and others made while trying to gain insight into the state of global sperm counts. Understanding these methodological choices is important to evaluating the strength and validity of claims about global sperm count decline, as well as unpacking how these claims interact with other discourses on gender, race, and nationhood.

Methods and Geospatial Categories

To be clear, the authors explicitly listed their methods in their publication, included two supplemental appendices, and after some light vetting agreed to share their data with us. This is good practice, and we appreciate their willingness to engage. But when we are dealing with sensitive topics, particularly those enmeshed in larger political conversations about the status of men, masculinity, and race relations—all regularly invoked by rising alt-right ethnonationalist movements—it’s important to be particularly discerning about the declared findings and transparent about how they came to be.

The Levine et al. study is what’s called a meta-analysis. A meta-analysis is a type of statistical study that combines the results of many other, already-published studies. That is, Levine et al. did not go out and collect new sperm count data from men around the globe; rather, they took sub-samples from pre-existing studies that met their inclusion criteria, pooled data from all of those studies, and then ran their analysis on the total dataset. Their analysis followed MOOSE guidelines (“Meta-analysis in Observational Studies in Epidemiology,” one of the gold standards for this type of research). They did not include studies measuring sperm count in men who had a preexisting health condition or exposure known to affect sperm count, such as genital abnormalities or smoking. The authors ended up with 185 studies conducted in countries around the world, spanning the years 1973-2011 and measuring the sperm count of about 43,000 men. 

The ways in which they chose to aggregate their data...relies on mostly cultural distinctions that feel intuitive rather than reflecting sound biological or epidemiological reasoning.

To visually represent the sample, the GenderSci Lab enlisted the services of the Harvard Center for Geographic Analysis and constructed a map (Fig. 1). As can be seen, the study involves an impressive array of countries—but not nearly every country or population on the globe, and many with a meager sample size. 

In their statistical analyses, Levine et al. first ran two regressions: one measuring sperm concentration (number of sperm per milliliter of semen), the other measuring total sperm count (total number of sperm per ejaculation) over the years 1973-2011. They found statistically significant declines of approximately -0.70 million sperm per milliliter (concentration) and -2.23 million sperm (total sperm count) per year between 1973-2011. The authors then ran statistical models that accounted for geographic regions and whether or not men were “fertile” (having already had a child) or “unselected” (not necessarily having a child already).

Figure 1

Figure 1

As reported in their supplementary materials, Levine et al.’s initial categories were:

  1. Europe/Australia 

    1. Fertile

    2. Unselected

  2. North America

    1. Fertile

    2. Unselected

  3. “Other”

    1. Fertile 

    2. Unselected

That is, they created geographic units of analysis for Europe (with Australia tacked on), North America, and every other country in the world (“Other”), and looked for trends in fertile and unselected men’s sperm counts in each of these buckets. Initially, they found statistically significant declines in sperm count for 3 of these 6 categories: Europe/Australia (fertile and unselected) and North America (unselected). “Other” fertile and unselected men did not show significant declines, nor did fertile North American men.

The authors then, however, changed their categories slightly for the final paper: they consolidated Europe/Australia and North America into one category called “Western” and kept “Other” the same. In the final model, both Unselected and Fertile Western had statistically significant negative effects. In other words, sperm count declines in North America among fertile men, which were not previously significant (p=0.29), gained manufactured significance (p=0.033) by being roped in with the already-significant European/Australian data in the final model. Now, the model shows that all Western fertile and unselected men have significant declines, while “Other” men do not.

It is justifiable to explore multiple aggregations of data alongside hypothesis-driven inquiries. However, reframing a statistically insignificant decline in sperm count among fertile North American men as significant by shifting aggregations implies a level of certainty the data do not support, naturalizes a West/“Other” distinction that we find faulty, and may influence future research programs.


Impacts and Conclusion

Levine et al.’s attempt to gain insight into a matter that has caused decades of debate in the reproductive sciences is admirable. But the ways in which they chose to aggregate their data—Europe/Australia at first and then “Western”; everything else being labelled as “Other”—relies on mostly cultural distinctions that feel intuitive rather than reflecting sound biological or epidemiological reasoning. While “the West” means something geopolitically, it is hardly a discrete and internally valid bioecological grouping. This is even more true for “Other.” Even a cursory glance at Figure 1 shows that there are enormous swaths of the globe unaccounted for, and many other possible ways we might conceive of grouping—or not—countries for which data exist. 

In a similar vein, the levels of abstraction in the meta-analysis involve  extrapolating, in some cases, extremely small sample sizes to speak for extraordinarily large portions of the human population. (E.g. all of Chile is represented by only 24 semen samples.) Studies that were conducted decades ago on a couple dozen men are being operationalized to represent the sperm counts not only of their respective countries, but also of the geopolitical categories of the West and “Other.” Thinking just within the context of a single country, for example, there is no reason to believe that a small sample of men in one part of a country, say, in an urban center, is going to be representative of the sperm quality of men in the rural regions of that country hundreds of miles away--particularly when the authors did not have any type of control for rurality in the study.

The levels of abstraction in the meta-analysis involve extrapolating, in some cases, extremely small sample sizes to speak for extraordinarily large portions of the human population.

Rather than accounting for socioecological context in a biologically meaningful way, we fear that by using an inappropriate yet culturally convenient schema—West vs. “Other”—the authors have stoked the flames of alt-right discourse, which frets about the decline of men and masculinity in a feminist age and a lost fertility race between whites and non-whites. This is most easily seen with the statistical sleight of hand wherein “fertile North Americans” went from having no significant decline to significant decline, merely by being thrown into the “Western” bin alongside Europe and Australia. 

The philosopher of science Bruno Latour once said, “Give me a laboratory and I will raise the world.” While Latour was primarily referring to lab bench science and not (just) analyses run in a statistical program, the ethos of the sentiment remains the same: the constructs that scientists generate on a spreadsheet become objectified as concrete, material truths, and circulate well beyond the context of the study itself. By operationalizing biologically unsubstantiated categorizations like Western and “Other,” we fear that Levine et al. uncritically allowed their data to interface with racialized anxieties about the decline of the West and white masculinity. 

In the case of this study, we believe that the numbers reported by Levine et al. are the product of culturally intuitive but scientifically inappropriate assumptions about groupings of humans across spaces and environments. By looking behind the curtains, one can identify these assumptions and subject them to critical evaluation. Especially when scientific findings begin to circulate in public discourse—when they are taken for granted as true—it is particularly important to ask: what, really, is in a number? 


Recommended Citation

Borsa, A. “Sperm Stats: What’s in a Number?” GenderSci Blog. 2021 May 4, genderscilab.org/blog/sperm-stats-whats-in-a-number

Statement of Intellectual Labor:

This blog post was primarily authored by Alex Borsa. The statistical analyses it is based on were jointly conducted by Alex and Joseph Bruch. Editing was done in an iterative process, with major contributions from multiple lab members, including Kelsey Ichikawa, Sarah Richardson, and Marion Boulicault.