Strong exploration (II)

Hypotheses are not predictions, predictions are not prognostications, and evidence is evidence

Oct 23, 2023

“The Death of Socrates,” by Jacques-Louis David.

Hypotheses are not predictions

You keep using that word. I do not think it means what you think it means.

While Batesian mimics have a cosmopolitan distribution in ecological papers, their core habitat is the final paragraph of the introduction section. By this point in the liturgy, one has already established the pecuniary importance of one's research topic, demonstrated a mastery of Google Scholar, and introduced a local study system that represents the whole world.1 Now the time has come to state hypotheses.

This calls for a certain decorum. Any artfulness of prose that might until now have graced the text is replaced with the stiffness of a subpoena. The whole legitimacy of the paper is at stake, so it is important to perform this ceremony only after the analysis is complete and one knows which "significant" relationships deserve to be promoted to hypotheses. Finally, having taken all due precautions:

"We hypothesize that Y will decrease in response to X."

You may recognize the account above as a thin satire of hypothesizing-after-results-are-known (HARKing)2 accompanied, as it typically is, by superficial narrative framing. But this is a secondary concern and, as we will see later, not necessarily a concern at all. The main problem is that the statement "Y will decrease with X" — regardless of where it originates in the chronology of a study — is simply not a hypothesis.3

In the context of scientific inference, a hypothesis is a causal account, an explanation for an observed pattern.4 It is not what you expect to find, it is why you expect to find it. Claims of the form “Y will decrease with X” are not hypotheses but predictions, and to dismiss this distinction as mere pedantry is to forfeit the logical core of science: hypotheses make predictions, and it is by their predictions that hypotheses are judged. When this dialectic collapses into the conflation of its opposing terms, science degenerates into a gibbering filibuster of its own process.

Predictions are not prognostications

Detached from the unstated hypothesis to which it belongs, an orphaned prediction appears a mere prognostication, a forecast concerning empirical outcomes. Among the consequences of this confusion is the tortuous notion that formal (and, one desperately hopes, correct) prognostication is the sine qua non of scientific validity. At best, this preoccupation with fortune-telling is a source of anxiety during the planning of a study. More often, though, studies commence prematurely under the lash of funding cycles and field work logistics, and serious consideration of outcomes begins only after the outcomes are known. At this point, of course, postgnostication is the best one can do, and whether one attempts to salvage a “hypothesis-driven” narrative is merely a question of ethical sensibilities.5

Mercifully, all this hand-wringing misses the point. The predictions by which hypotheses are judged are not prognostications; rather, scientific predictions are logically — and, therefore, atemporally — implicit in hypotheses, regardless of when (or even whether) they are stated.6 In other words, predictions are logical properties of hypotheses, not independent entities that come into existence upon utterance. Given the premise that all men are mortal and the hypothesis that Socrates is a man, the eventual death of Socrates is just as logically implicit today as it was before prognostication on the matter was rendered moot by a cup of hemlock.7

Evidence is evidence

It is by means of the predictions logically implicit in a hypothesis that observations become evidence. Because predictions are logical, not chronological, there is no purely epistemic difference between observations that give rise to a new hypothesis and observations that test an existing one; both are convertible to the atemporal “stuff” of evidence when related to a hypothesis via the logic of its predictions.8 When we formulate a hypothesis to explain an set of observations, we are saying that the observations are implicit in the proposed hypothesis, i.e. that the causal logic of the hypothesis “predicts” (in the atemporal sense) what we have already observed. When, with a hypothesis in hand, we aim to gather further evidence concerning its validity, we again use the causal logic of the hypothesis to identify implicit observations (i.e. predictions) and subject them to empirical scrutiny.

This indifference of causal logic to the chronological workflow of research means that the deceptive construal of post-hoc hypotheses as preconceived — which is actually the only sense in which HARKing is intrinsically objectionable9 — is not only unethical but wholly unnecessary.10

This has implications that are at once counter-intuitive, liberating, and dangerous.

My point here is not to be cynical but to show what sort of things fill the void when you don’t know what your paper is about. There is really very little middle ground when it comes to introduction sections: an introduction is either a well-curated synthesis of relevant background information related to a local study system by a clear articulation of the inferential logic, or it is a smokescreen. Having written both kinds of introductions, I say this as much as a judgment of my own work as of anyone else’s.

Kerr NL (1998) HARKing: hypothesizing after the results are known. Personality and Social Psychology Review, 2, 196–217.

I want to thank Dan Herms for drilling this into me and my fellow students in his outstanding course “The Nature and Practice of Science,” which he taught during his professorship at Ohio State University.

Can hypotheses be ontological (i.e. claims about what something is — or, indeed, that something is) as well as etiological (claims about causal processes)? This is an interesting question, but I will dodge it for now by suggesting that, when stated in relation to data, any ontological claim can be reformulated as a causal claim. For example, consider the claim that a given fossil belongs to a certain taxon. In relation to data — such as the morphological characters of the fossil — the taxonomic claim about the fossil's identity becomes a causal claim about the morphological characters, i.e. that a given character state exists because the fossil belongs to the proposed taxon. Since the defining feature of science is the dialectic between claims and evidence, every ontological claim is operationalized as a causal claim.

Vancouver JB (2018) In Defense of HARKing. Industrial and Organizational Psychology, 11, 73–80.

Indeed, while the term “prediction” has a venerable history in the philosophy of science, it is inherently misleading. At an etymologically literal level, it carries exactly the sense of prognostication that I have disavowed. It would be much more informative to speak of hypotheses and their implications, since this would stress the logical rather than chronological relation between the terms. For the sake of continuity, though, I will continue to use the language of prediction in the atemporal sense described above.

This reference is drawn from the classical example of a syllogism: “All men are mortal; Socrates is a man; therefore, Socrates is mortal.” The truth of this conclusion was demonstrated when, in 399 BC, Socrates was executed by a fatal dose of hemlock (Conium maculatum) extract, as depicted in David’s painting at the top of this post.

Syrjänen P (2022). The epistemic role of prediction in science. Helsingin Yliopisto.

Hollenbeck JR, Wright PM (2017) Harking, sharking, and tharking: Making the case for post hoc analysis of scientific data. Journal of Management, 43, 5–18.

Vancouver JB (2018) In Defense of HARKing. Industrial and Organizational Psychology, 11, 73–80.

2 Comments

Luke Hearon

Delightful and precise. #DataDrivenFilibuster

I have so many thoughts and such little willpower to formalize them, but the two points that stick in my craw are:

1. Ontological hypotheses. I have no formal counterargument yet, but the purist in me (that is, me) still wants these to be seen as hypotheses sensu lato—hypotheses of the second degree or by analogy. The handling of an ontological hypothesis is just appreciably, intuitively different and fit clumsily where causal hypotheses settle naturally; but I'm still rendering the fat of your point here.

2. HARK, something equal this way comes. I have yet to read your citations on this point. I agree that all else being equal the atemporality of logic demands a pre and post hoc hypothesis are equally supported given identical sets of evidence. However, all else is not equal. Sketching out a napkin paper causal structure: time of hypothesizing is a descendant of desperation of the researcher to torture something—anything—out of a dataset. Credibility of the hypothesis is a descendant of data torture^1. There is no causal connection between time of hypothesis and evidentiary support of the hypothesis, but there is a valuable predictive correlation between the two. In an interesting way, the water flows upstream here; I think we're misled by a first approximation causal understanding where a good old cause-blind correlation would have us on the right track. To give a second order approximation:

(hypothesis timing) ← (researcher despair) → (researcher amenability to torture) → (hypothesis credibility)

p(hypothesis true | timing) != p(hypothesis true)

even while

p(hypothesis true | timing) != p(hypothesis true do(timing))

^1 maybe I have just wrapped the crux of the issue in so much gauze. How can I claim that the further the rack stretches, the less trustworthy the hypothesis? I feel it to be true, I would put a handsome wager on it, but I cannot prove it. I suppose my understanding is that the complexity of our world is derived from the structure of its causal network, while the functions along any edge of the network are really quite simple; hence, data captured at the appropriate nodes should not require torture to reveal relevant patterns while data captured at the inappropriate nodes should require much transformation before revealing the true pattern, but the arc of incorrect transformations quickly outgrows the slim angle of correct transformations as the diameter of that circle grows.

Expand full comment

1 reply by Doug Sponsler

1 more comment...

Science at Dusk

Strong exploration (II)

Hypotheses are not predictions, predictions are not prognostications, and evidence is evidence

Hypotheses are not predictions

Predictions are not prognostications

Evidence is evidence