He . . . proved . . .WHAT?!
When Firefox’s pocket feature showed me the headline, my eyebrow must have raised by an inch. Huh?!
So let’s get to it.
“Well, extrasensory perception, also called ESP, is when you can perceive things that are not immediately available in space or time,” Bem said. “So, for example, when you can perceive something on the other side of the world, or in a different room, or something that hasn’t happened yet.”
It occurred to Wu that the flyer might have been a trick. What if she and the other women were themselves the subjects of Bem’s experiment? What if he were testing whether they’d go along with total nonsense?
“I know this sounds kind of out there,” Wu remembers Bem saying, “but there is evidence for ESP, and I really believe it. But I don’t need you to believe it. In fact, it’s better if you don’t. It’s better if I can say, ‘Even my staff don’t believe in this.’ ”
As Bem went on, Wu began to feel more at ease. He seemed genuine and kind, and he wasn’t trying to convert her to his way of thinking. OK, so maybe there’s going to be a you-got-punked moment at the end of this, she thought, but at least this guy will pay me.
Uh . . .
I’m already reminded of all those popular ghost hunting shows I’ve thoroughly developed a hatred for. Well, except for Buzzfeed’s Unsolved Series. Because Shane and Ryan are hilarious.
Anyway, I already don’t like the way this is being presented.
1.) Specifically, the usage of the word Believe. I believe, you don’t have to believe . . . IT DOES NOT MATTER!
There is either fact or speculation. Belief should have NOTHING to do with the process. Because you don’t need belief when you have facts and data.
2.) When studying ANY area of the paranormal, it is never wise to use people knowledgeable of either the phenomenon you are studying OR the test you are running. Because even if you and all the participants THINK they are rational enough to not be fooled by their imagination, there is no way to affirm this. To bring the Artificial Intelligence conversation into this realm, our brains are the ULTIMATE black box. Given our imagination in combination with our inability to understand the basics of what drives the majority of our decision making, humans knowledgable of the experiment are contaminated from the get-go.
If there is too be accurate and trusted measures of paranormal phenomena, they will have to be run in conjunction with an alternative distraction (such as a contest!). Reality TV is littered with potential ideas.
In truth, Bem had no formal funding for his semisecret research program. For nearly a decade, he’d been paying undergraduates like Wu out of his own pocket, to help him demonstrate that we all possess some degree of precognition—a subtle sense of what will happen in the future. He rarely came into the lab himself, so he’d leave his lab assistants an envelope stuffed with bills. They dispensed $5 from the kitty to each subject they ran through the experiment.
For the rest of that semester and into the one that followed, Wu and the other women tested hundreds of their fellow undergrads. Most of the subjects did as they were told, got their money, and departed happily. A few students—all of them white guys, Wu remembers—would hang around to ask about the research and to probe for flaws in its design. Wu still didn’t believe in ESP, but she found herself defending the experiments to these mansplaining guinea pigs. The methodology was sound, she told them—as sound as that of any other psychology experiment.
That is undoubtedly an eyebrow-raiser of a paragraph. We don’t hear about the experiment, but we DO hear a very pointed allegation. One of which hardly seems worth noting if the techniques utilized are genuinely sound.
There will always be idiots. We all know this. Why give them airtime they clearly don’t deserve?
In the spring of 2010, not long after Wu signed on, Bem decided he’d done enough to prove his claim. In May, he wrote up the results of his 10-year study and sent them off to one of his field’s most discerning peer-reviewed publications, the Journal of Personality and Social Psychology. (JPSP turns away some 85 percent of all submissions, making its acceptance rate comparable to that of the Cornell admissions office.) This was the same journal where Bem had published one of the first papers of his career, way back in 1965. Now he would return to JPSP with the most amazing research he’d ever done—that anyone had ever done, perhaps. It would be the capstone to what had already been a historic 50-year career.
Having served for a time as an associate editor of JPSP, Bem knew his methods would be up to snuff. With about 100 subjects in each experiment, his sample sizes were large. He’d used only the most conventional statistical analyses. He’d double- and triple-checked to make sure there were no glitches in the randomization of his stimuli.
Even with all that extra care, Bem would not have dared to send in such a controversial finding had he not been able to replicate the results in his lab, and replicate them again, and then replicate them five more times. His finished paper lists nine separate ministudies of ESP. Eight of those returned the same effect.
Bem’s 10-year investigation, his nine experiments, his thousand subjects—all of it would have to be taken seriously. He’d shown, with more rigor than anyone ever had before, that it might be possible to see into the future. Bem knew his research would not convince the die-hard skeptics. But he also knew it couldn’t be ignored.
Enough with the drama. Get to the bloody point.
When the study went public, about six months later, some of Bem’s colleagues guessed it was a hoax. Other scholars, those who believed in ESP—theirs is a small but fervent field of study—saw his paper as validation of their work and a chance for mainstream credibility.
But for most observers, at least the mainstream ones, the paper posed a very difficult dilemma. It was both methodologically sound and logically insane. Daryl Bem had seemed to prove that time can flow in two directions—that ESP is real. If you bought into those results, you’d be admitting that much of what you understood about the universe was wrong. If you rejected them, you’d be admitting something almost as momentous: that the standard methods of psychology cannot be trusted, and that much of what gets published in the field—and thus, much of what we think we understand about the mind—could be total bunk.
If one had to choose a single moment that set off the “replication crisis” in psychology—an event that nudged the discipline into its present and anarchic state, where even textbook findings have been cast in doubt—this might be it: the publication, in early 2011, of Daryl Bem’s experiments on second sight.
The replication crisis as it’s understood today may yet prove to be a passing worry or else a mild problem calling for a soft corrective. It might also grow and spread in years to come, flaring from the social sciences into other disciplines, burning trails of cinder through medicine, neuroscience, and chemistry. It’s hard to see into the future. But here’s one thing we can say about the past: The final research project of Bem’s career landed like an ember in the underbrush and set his field ablaze.
I’m glad his fans are as humble as the man himself.
I am now about to wade into waters WAY more in-depth than I am accustomed to, so please bear with me. Though I lack any kind of degree, even I can’t help but criticize this hypothesis. Daryl Bem’s research notwithstanding, the replication crisis is a consequence of attempting to further solidify the credibility of our knowledge base. There can only be a crisis if the theories were shaky to start with (and thus should have been reviewed in the first place!). And aside from this:
Many of the papers that were retested contained multiple experiments. Only one experiment from each paper was tested. So these failed replications don’t necessarily mean the theory behind the original findings is totally bunk.
If you check out the Vox article, there are many reasons why it can be hard to replicate past experiments, not all of them involving malice.
Past studies didn’t break down because of some guy doing a self-driven analysis on a paranormal subject. They broke down because of scientists doing what scientists should be doing, and because science, the world, and human psychology, in particular, are deeply complex issues. When variables can be a moving target (such as the individual and collective human mind as time moves on and new technologies create adaptations), this is to be expected.
Scientists may not necessarily have been wrong. They just might not have grasped the entire picture yet. An issue of particular interest if this endpoint in itself keeps changing alongside our collective evolution.
Daryl Bem has always had a knack for not fitting in. When he was still in kindergarten—a gentle Jewish kid from Denver who didn’t care for sports—he was bullied so viciously that his family was forced to move to a different neighborhood. At the age of 7, he grew interested in magic shows, and by the time he was a teenager, he’d become infatuated with mentalism. Bem would perform tricks of mind-reading and clairvoyance for friends and classmates and make it seem as though he were telepathic.
As a student, Bem was both mercurial and brash. He started graduate school in physics at MIT, then quickly changed his mind, transferring to the University of Michigan to study as a social psychologist. While at Michigan, still in his early 20s and not yet in possession of his Ph.D., Bem took aim at the leading figure in his field, Leon Festinger. For his dissertation, Bem proposed a different explanation—one based on the old and out-of-fashion writings of behaviorist B.F. Skinner—for the data that undergirded Festinger’s theory of cognitive dissonance.
This would be Bem’s method throughout his career: He’d jab at established ways of thinking, rumble with important scholars, and champion some antique or half-forgotten body of research he felt had been ignored. Starting in the 1970s, he quarreled with famed personality psychologist Walter Mischel by proffering a theory of personality that dated to the 1930s. Later, Bem would argue against the biological theory of sexual orientation, favoring a developmental hypothesis that derived from “theoretical and empirical building blocks … already scattered about in the literature.”
An unorthodox man and thinker. There is nothing wrong with that.
So long as you can back it all up.
As a young professor at Carnegie Mellon University, Bem liked to close out each semester by performing as a mentalist. After putting on his show, he’d tell his students that he didn’t really have ESP. In class, he also stressed how easily people can be fooled into believing they’ve witnessed paranormal phenomena.
Around that time, Bem met Robert McConnell, a biophysicist at the University of Pittsburgh and an evangelist for ESP research. McConnell, the founding president of the Parapsychological Association, told Bem the evidence for ESP was in fact quite strong. He invited Bem to join him for a meeting with Ted Serios, a man who could supposedly project his thoughts onto Polaroid film. The magic was supposed to work best when Serios was inebriated. (The psychic called his booze “film juice.”) Bem spent some time with the drunken mind-photographer, but no pictures were produced. He was not impressed.
I should hope not.
Also, are we done with the biography yet?
In his skepticism about ESP, Bem for once was not alone. The 1970s marked a golden age for demystifying paranormal claims. James Randi, like Bem a trained stage magician, had made his name as a professional debunker by exposing the likes of Uri Geller. Randi subsequently took aim at researchers who studied ESP in the lab, sending a pair of stage performers into a well-funded parapsychology lab at Washington University in 1979. The fake psychics convinced the lab their abilities were real, and Randi did not reveal the hoax until 1983.
As debunkers rose to prominence, the field of psychical research wallowed in its own early version of the replication crisis. The laboratory evidence for ESP had begun to shrivel under careful scrutiny and sometimes seemed to disappear entirely when others tried to reproduce the same experiments. In October 1983, the Parapsychology Foundation held a conference in San Antonio, Texas, to address the field’s “repeatability problem.” What could be done to make ESP research more reliable, researchers asked, and more resilient to fraud?
A raft of reforms were proposed and implemented. Experimenters were advised to be wary of the classic test for “statistical significance,” for example, since it could often be misleading. They should avail themselves of larger groups of subjects, so they’d have sufficient power to detect a real effect. They should also attempt to replicate their work, ideally in adversarial collaborations with skeptics of the paranormal, and they should analyze the data from lots of different studies all at once, including those that had never gotten published. In short, the field of parapsychology decided to adopt the principles of solid scientific practice that had long been ignored by their mainstream academic peers.
Uh . . .good?
Props for getting into a position that you should have adopted in the first place?
As part of this bid to be taken seriously by the scientific establishment, a noted ESP researcher named Chuck Honorton asked Bem to visit his lab in Princeton, New Jersey. He thought he’d found strong evidence in favor of telepathy, and he wanted Bem to tell him why he might be wrong.
Bem didn’t have an answer. In 1983, the scientist and stage performer made a careful audit of the Honorton experiments. To his surprise, they appeared to be airtight. By then, Bem had already started to reconsider his doubts about the field, but this was something different. Daryl Bem had found his faith in ESP.
FINALLY, something non-biographical.
Daryl Bem may not have had an answer in regards to the Ganzfeld experiments, but a man named Ray Hyman certainly did.
First off, an explanation:
However, in ganzfeld, the idea is to instead provide homogenous stimuli. The subject, called the “receiver”, sits comfortably in a recliner, wearing headphones playing gentle white noise. The room is bathed in red light and the receiver wears translucent cups over the eyes, so all they see is a uniform, featureless red. They are relaxed and cozy. That’s the physical setting of the experiment. Two other people are involved: an experimenter and a “sender”. The sender, in an isolated room where they cannot be seen or heard by the receiver, concentrates for 30 minutes on a “target”, which is some object or video clip or something. Throughout the 30 minutes, the receiver is supposed to verbally recite what they see or imagine. The experimenter, who is also supposed to be isolated from both the sender and the receiver, records what the receiver says, and usually keeps notes about what they describe.
At the end of the 30 minutes, the receiver is shown the actual target upon which the sender was focusing, presented alongside with three other control objects. The receiver guesses which of the four most closely resembles their impressions during the ganzfeld session. Pure chance predicts a 25% hit rate. But ganzfeld experiments became famous within the parapsychology community because experimenters consistently found a significantly higher hit rate; closer to 35%.
And now Ray Hyman’s critique of the technique (from the same source as the previous):
The criticisms that Hyman found were inadequate randomization; sensory leakage (meaning that in some cases, the receivers could actually hear what was going on in the sender’s room next door; in others, it was possible for things like the sender’s fingerprints to be visible on the target object for the receiver to see); and inappropriate statistical analysis.
Mainly, Hyman felt that Honorton’s work suffered from a type of statistical complication called multiple testing. In a nutshell, multiple testing is when you take more and more variables into account between two groups; sooner or later you’re going to find more and more differences between them. These variables included the different ways that researchers had categorized the senders and receivers, cross referencing them to the results. They found that subjects were more likely to have positive results if they had been educated in a creative field; if they already had a strong belief in psychic powers; if they were extroverted; and if the experiment was conducted in a warm and welcoming atmosphere. Hyman believed that the positive results reported by Honorton were due, at least in part, to multiple testing effects that inappropriately considered these types of variables.
He also mentions another phenomenon that I hadn’t previously heard of.
Hyman also found that the “file drawer effect” came into play, which is when studies are abandoned when they end up not showing any interesting results. Thus, the body of published work was inappropriately skewed to include those results which showed a positive result, which is going to happen sometimes simply due to random variances. Hyman figured that, working backwards and accounting for the degrees to which various weaknesses were present in each of the studies, the actual size of the effect was zero.
While some of this is over my head, I more or less understand what he is getting at. To put it one way, the results of this test can likely be influenced if proper attention isn’t given to various traits of participant’s personalities (dare I say, our black box brain strikes again?). While manipulation does not necessarily have to be intentional, the fact that it can be accomplished should serve as a cautionary warning to anyone viewing the results.
I may be knit picky. But I don’t like this.
By then, Bem had already started to reconsider his doubts about the field, but this was something different. Daryl Bem had found his faith in ESP.
The word faith is like the word believe. I tend to recoil when I see it used anywhere near the realm of scientific inquiry. Again, maybe a tad nit-picky. However, words matter. As does keeping a 50-foot moat between fact and bullshit.
Not long after she was hired, Jade Wu found herself staring at a bunch of retro pornography: naked men with poofy mullets and naked girls with feathered hair. “I’m gay, so I don’t know what’s sexy for heterosexuals,” Bem had said, in asking for her thoughts. Wu didn’t want to say out loud that the professor’s porno pictures weren’t hot, so she lied: Yeah, sure, they’re erotic.
These would be the stimuli for the first of Bem’s experiments on ESP (or at least the first one to be reported in his published paper). Research subjects—all of them Cornell undergraduates—saw an image of a pair of curtains on a computer monitor. They were then prompted to guess which of the curtains concealed a hidden image. The trick was that the correct answer would be randomly determined only after the student made her choice. If she managed to perform better than chance, it would be evidence that she’d intuited the future.
Bem had a reason for selecting porn: He figured that if people did have ESP, then it would have to be an adaptive trait—a sixth sense that developed over millions of years of evolution. If our sixth sense really had such ancient origins, he guessed it would likely be attuned to our most ancient needs and drives. In keeping with this theory, he set up the experiment so that a subset of the hidden images would be arousing to the students. Would the premonition of a pornographic image encourage them to look behind the correct curtain?
This demonstration sounds a lot like starting from a preconceived conclusion, then working backwards to fill in the gaps. Not unlike the Ghost Hunters, who enter a home and only attempt to contact the person of the stories and not just anyone that happens to be floating around.
Alternatively, what if pornography acts as a sort of psychic repellent for a student? Could that be measured?
The data seemed to bear out Bem’s hypothesis. In the trials where he’d used erotic pictures, students selected their location 53 percent of the time. That marked a small but significant improvement over random guessing.
If you have a choice of 2 curtains, 50% sounds mighty close to the halfway mark in terms of just picking the correct object by luck. Maybe I am looking at this incorrectly. I am not a psychologist or scientist!.
For another experiment, Bem designed a simple test of verbal memory. Students were given several minutes to examine a set of words, then were allotted extra time to practice typing out a subset of those words. When they were asked to list as many of the words as possible, they did much better on the ones they’d seen a second time. That much was straightforward: Practice can improve your recall. But when it was time to run the study, Bem flipped the tasks around. Now the students had to list the words just before the extra practice phase instead of after it. Still, he found signs of an effect: Students were better at remembering the words they would type out later. It seemed as though the practice session had benefits that extended backward through time.
Similar experiments, with the sequence of the tasks and stimuli reversed, showed students could have their emotions primed by words they hadn’t seen, that they would recoil from scary pictures that hadn’t yet appeared, and that they would get habituated to unpleasant imagery to which they would later be exposed. Almost every study worked as Bem expected. When he looked at all his findings together, he concluded that the chances of this being a statistical artifact—that is to say, the product of dumb luck—were infinitesimal.
This did not surprise him. By the time he’d begun this research, around the turn of the millennium, he already believed ESP was real. He’d delved into the published work on telepathy and clairvoyance and concluded that Robert McDonnell was right: The evidence in favor of such phenomena, known to connoisseurs as “psi” processes, was compelling.
Indeed, a belief in ESP fit into Bem’s way of thinking—it tempted his contrarianism. As with his attacks on cognitive dissonance and personality theory, Bem could draw his arguments from a well-developed research literature—this one dating to the 1930s—which had been, he thought, unfairly rejected and ignored.
Or . . . proven false and cast aside?
Together with Chuck Honorton, the paranormal researcher in Princeton, Bem set out to summarize this research for his mainstream colleagues in psychology. In the early 1990s, they put together a review of all the work on ESP that had been done using Honorton’s approach and sent it to Bem’s associate Robert Sternberg, then the editor of Psychological Bulletin. “We believe that the replication rates and effect sizes achieved … are now sufficient to warrant bringing this body of data to the attention of the wider psychological community,” he and Honorton wrote in a paper titled “Does Psi Exist?” Sternberg made the article the lead of the January 1994 issue.
By 2001, Bem had mostly set aside his mainstream work and turned to writing commentaries and book reviews on psi phenomena. He’d also quietly embarked upon a major scientific quest, to find what he called “the holy grail” of parapsychology research: a fully reproducible experiment on ESP that any lab could replicate. His most important tool, as a scientist and rhetorician, would be simplicity. He’d work with well-established protocols, using nothing more than basic tests of verbal memory, priming, and habituation. He’d show that his studies weren’t underpowered, that his procedures weren’t overcomplicated, and that his statistics weren’t convoluted. He’d make his methods bland and unremarkable.
In 2003, 2004, 2005, and 2008, Bem presented pilot data to the annual meeting of the Parapsychological Association. Finally, in 2010, after about a decade’s worth of calibration and refinement, he figured he’d done enough. A thousand subjects, nine experiments, eight significant results. This would be his solid, mainstream proof of ESP—a set of tasks that could be transferred to any other lab.
On May 12, 2010, he sent a manuscript to the Journal of Personality and Social Psychology. He called it “Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect.”
Alright, this has gone on WAY too long. Let’s get to the meat and potatoes already.
Honestly . . . easier said than done. To start off, I can direct our attention to an article on Medium titled Discovery Without Theory: Thoughts on Daryl Bem And ‘Broken Science’. This article was written by Kate Nussenbaumback back in 2017, and as you may have guessed, it’s directly referencing the Slate article I am working with.
A fascinating Slate article made its way around science Twitter last week. The article describes psychologist Daryl Bem’s quest to demonstrate the existence of extrasensory perception (ESP) through a 10-year series of experiments in his lab at Cornell. The kicker? Using conventional techniques in experimental design and statistics, Bem successfully demonstrated that participants in his research could predict the future. He ultimately published his work in one of his field’s top publications, the Journal of Personality and Social Psychology.
Bem’s tale is equally interesting and disturbing because he’s no Brian Wansink (though that story is also equal parts interesting and disturbing). In other words, the crazy effects he describes in his paper are seemingly not a result of fraud — he didn’t fabricate his data, he did not purposefully mislead readers with his description of his methodology, and he encouraged others to attempt to replicate his findings. The Slate article portrays him as an open and honest scientist.
And that’s why his story is so scary. It essentially means that using the widely accepted statistical approaches in psychology, one can find evidence for basically anything. In part, that’s because one widely used practice (even though at this point everyone knows it’s unacceptable) is to write papers as though the single analysis that reveals the expected was the single analysis that was planned out from the beginning of the study, and to ignore the other statistical tests that were conducted but that didn’t “work.”
I can’t help but think of something that occurred to me earlier in this piece:
This demonstration sounds a lot like starting from a preconceived conclusion, then working backwards to fill in the gaps. Not unlike the Ghost Hunters whom enter a home and only attempt to contact the person of the stories and not just anyone that happens to be floating around.
I will now skip ahead to get straight to a part that I really find interesting.
Instead, I’m interested in another aspect of Bem’s work that, as far as I can tell, has been mostly ignored in the coverage of his crazy findings: He had no theory. He had no theory. The entire foundation for 10 years of his work were his own fanciful whims.
The Slate article overlooks this, and describes Bem’s “hypotheses” as well as the earlier, theoretical blocks on which he built his work. But the theoretical background cited in the article is simply an earlier series of experiments that may have demonstrated the existence of ESP. Again, as far as I can tell, that research also failed to include any explanation for how or why such effects emerged.
In fact, in his description of Bem’s work, Engber consistently misuses the term “hypothesis,” in places where he should have used the term “prediction.” In science, a hypothesis involves an explanation.
At the end of Bem’s paper, Bem does spew some stuff about quantum theory in physics and electromagnetic signal transmission, but he openly acknowledges that those ideas do not amount to any sort of mechanistic explanation for how ESP might be operating. In fact, he justifies his entire 10-year investigation with the following sentence in his paper’s introduction:
“Historically, the discovery and scientific exploration of most phenomena have preceded explanatory theories by decades or even centuries.”
To me, this sentence demonstrates his work’s most egregious flaw. Bem seems to conflate the idea of serendipitous scientific discovery with actively searching for an effect without any sort of mechanistic hypothesis. Alexander Fleming observed that a certain mold had anti-bacterial properties; he then spent the next decade exploring its biochemical effects. He didn’t wake up one day and say, “Antibiotics must be real!” without any sort of testable explanation for why that would be the case.
It’s hard to articulate why, but I think there is a fundamental difference between discovering something that’s hard to explain, and actively searching for an effect without a reason. Psychological science is beautiful and powerful when it builds on itself to deepen our understanding of how the mind works. Everyone in the world can observe weird stuff that their brains seemingly do, but the role of scientists is to elucidate not just what, but why.
I agree. Particularly with the final sentence.
When it comes to scientific literacy, the media is the last place one should look for certain information. Not necessarily because journalists and commentators don’t know what they are talking about (or bring some form of bias to the reporting, though it definitely can happen). It’s more because non-scientists, in general, tend to not comprehend the complexities behind many of the outcomes of the science. One need look no further than the history of various widely published studies focusing on almost any subject matter, as covered by the media.
If the people behind the science aren’t interested in ensuring it’s quality and accountability, who else is going to do it?
In today’s media environment, that is a scary question. But that is not the focus of this piece.
Bem’s work does raise interesting questions: Is it valuable to publish work that just illuminates effects? What constitutes a theory that’s worth testing? How much evidence should there be for a probable effect before its existence is investigated? How do we make the distinction between serendipitous discovery and active searching for nonsensical findings?What about when ideas aren’t as obviously nonsensical as ESP? Is there a difference between mechanistic hypotheses that are then tested vs. strange effects that are searched for and later justified with mechanistic explanations? When does something qualify as a satisfactory mechanistic explanation? How do we avoid conflating “mechanisms” with “neural data”? If his findings did show the existence of a real effect, should we care about that effect if it does not build on any prior work or offer any sort of insight into how the brain / mind works? If we insist that all studies are justified by logical theories before they are conducted, is there a risk that we will prevent people from conducting work that may challenge conventional wisdom, particularly since frequentist statistics can really only confirm hypotheses?
I don’t know the answers to any of these questions. But I do think it’s important that when we think about what Bem’s work reveals about the ways in which science is broken, we don’t just focus on the pitfalls of conventional statistical techniques. His story also raises questions about the complex relationships between theory and discovery that are critical to think about as we attempt to construct a better science moving forward.
While Kate acknowledges the importance of getting this right from the standpoint of ensuring good quality scientific inquiry, it also brings to mind a new consideration. How the scientific establishment as a whole has reacted to researchers in any field of study who continue to gain fame and notoriety off of long since debunked works of junk science. From my viewpoint from the outside, it SEEMS like most within academia are either ignorant of, or pay little attention to, the spread of trash science and academics in the realm of social media. In particular, the increasing growth of crowdfunded pseudo-academic movements such as The Intellectual Dark Web. I’m tempted to call it The Intellectual Cash Cow, but I already have.
Ideas are big money. Or more accurately, trending ideas are big money.
Either way, while the IDW bastardizes many more areas of expertise than those related to science or sociology, you get the point. In an environment where both consumer demand and financial reward can be found in ANY academic pursuit (legitimate or not!), the legitimate figures need to come out ahead of the frauds. In the world in which we currently live, not doing so only continues to erode public trust in both science and academia as a whole.
But that is another tangent. Time to get back to the original inquiry. Did Dr. Bem REALLY prove ESP?
According to at least one follow up study . . . not really.
This paper reports three independent attempts to replicate the retroactive facilitation of recall effect . All three experiments employed almost exactly the same procedure and software as the original experiment. In addition, they used the same number of participants as the original study and thus had sufficient statistical power to detect an effect (our three experiments combined had 99.92% power to detect the same effect size).
While Bem found a substantial effect, our results failed to provide any evidence for retroactive facilitation of recall. Although we opted to follow Bem’s preferred strategy of using one-tailed tests, we acknowledge that there are arguments against this approach  and it might be objected that had we opted for the generally more accepted approach of using two-tailed tests, we would indeed have had one statistically significant finding to report, i.e., the finding that the high SS participants in Replication 2 recalled fewer of the practice words than the control words. We feel that it is safe to dismiss this finding as almost certainly spurious given the relatively large number of statistical tests carried out and the fact that the difference is in the opposite direction to that predicted by Bem. Furthermore, no such trend was discernable in the other experiment that collected SS scores.
First off, one and two-tailed tests. Because I have no idea.
When you conduct a test of statistical significance, whether it is from a correlation, an ANOVA, a regression or some other kind of test, you are given a p-value somewhere in the output. If your test statistic is symmetrically distributed, you can select one of three alternative hypotheses. Two of these correspond to one-tailed tests and one corresponds to a two-tailed test. However, the p-value presented is (almost always) for a two-tailed test. But how do you choose which test? Is the p-value appropriate for your test? And, if it is not, how can you calculate the correct p-value for your test given the p-value in your output?
What is a two-tailed test?
First let’s start with the meaning of a two-tailed test. If you are using a significance level of 0.05, a two-tailed test allots half of your alpha to testing the statistical significance in one direction and half of your alpha to testing statistical significance in the other direction. This means that .025 is in each tail of the distribution of your test statistic. When using a two-tailed test, regardless of the direction of the relationship you hypothesize, you are testing for the possibility of the relationship in both directions. For example, we may wish to compare the mean of a sample to a given value x using a t-test. Our null hypothesis is that the mean is equal to x. A two-tailed test will test both if the mean is significantly greater than x and if the mean significantly less than x. The mean is considered significantly different from x if the test statistic is in the top 2.5% or bottom 2.5% of its probability distribution, resulting in a p-value less than 0.05.
What is a one-tailed test?
Next, let’s discuss the meaning of a one-tailed test. If you are using a significance level of .05, a one-tailed test allots all of your alpha to testing the statistical significance in the one direction of interest. This means that .05 is in one tail of the distribution of your test statistic. When using a one-tailed test, you are testing for the possibility of the relationship in one direction and completely disregarding the possibility of a relationship in the other direction. Let’s return to our example comparing the mean of a sample to a given value x using a t-test. Our null hypothesis is that the mean is equal to x. A one-tailed test will test either if the mean is significantly greater than x or if the mean is significantly less than x, but not both. Then, depending on the chosen tail, the mean is significantly greater than or less than x if the test statistic is in the top 5% of its probability distribution or bottom 5% of its probability distribution, resulting in a p-value less than 0.05. The one-tailed test provides more power to detect an effect in one direction by not testing the effect in the other direction. A discussion of when this is an appropriate option follows.
Okay, I think I grasp the nuances. A one-tailed test only measures data about one direction of interest, whereas a two-tailed test splits the data in half and tests along 2 differing directions of interest.
Or as my laymen brain seems to be telling me, Daryl Bem only wants researchers to focus on the positive results, not the negative. Just one tail, not 2.
One interpretation of these findings centres on the possibility that Bem’s original effect was due to the types of statistical and methodological artifacts outlined by several critics , , , , . Similar arguments apply to the alleged correlation between participants’ performance on the test of precognition and their scores on the Stimulus Seeking Scale. This scale was far from the only variable recorded during Bem’s studies. In fact, several other variables are recorded by the experimental program but are not mentioned by Bem, including participant age, their test anxiety level, and how often they have used meditation or self-hypnosis. The experimenter is also asked to record how enthusiastic each participant appears, and how ‘friendly’ they are towards the experimenter. It is unclear whether the relationship between participants’ scores on the tests of precognitive ability and such variables were examined.
Alternatively, it may be the case that the effect is genuine, but problematic to replicate. Replication issues have long dogged parapsychology, with proposed explanations focusing on experimental artifacts, fraud, or variation in psi ability on the part of both participants and experimenters , . It has also been suggested that psi is elusive, and does not lend itself to laboratory study in the same manner as other psychological effects .
However, as noted above, Bem explicitly stated that Experiment 9 should be among the easiest of his studies to replicate , and all three Principal Investigators went to considerable lengths to ensure that their attempted replications matched his original study. Experimenter involvement was kept to a minimum by the use of the same computer programs used in the original experiment, and any potential experimenter effects in two of the studies were minimised by having student assistants conduct them.
The only noteworthy difference between Bem’s experiment and our replication attempts is that we conducted our experiments after his had received substantial media attention. Thus, the possibility arises that, since some of our participants might have heard of Bem’s study, they may have known what to expect in the procedure. This could have influenced their performance, perhaps leading them to explicitly attempt to memorize the stimulus words (we are grateful to an anonymous reviewer for bringing this potential limitation to our attention). However, while the participants knew the experiment concerned ESP, they were not informed that it was a replication attempt of a specific study until after they completed the procedure. In addition, the computer’s random selection of words after the memory test meant that foreknowledge of the procedure should not have influenced the results in any particular direction.
Our failure to find similar results even after three close replication attempts, along with the methodological and statistical issues discussed above and at least one other published report of a failed replication attempt , leads us to favour the ‘experimental artifacts’ explanation for Bem’s original result.
To put it all into perspective for the rest of us:
1. an experimental finding that is not a reflection of the true state of the phenomenon of interest but rather is the consequence of a flawed design or analytic error. For example, characteristics of the researcher (e.g., expectations, personality) or the participant (e.g., awareness of the researcher’s intent, concern over being evaluated) are common sources of artifacts.
I could have saved myself hours worth of annoyance and frustration (including trying to access scientific papers behind paywalls. WHY?!) by simply grabbing the final sentence from the abstract above, then called it a day. But I had to see this through.
In conclusion, we are once again forced to admit that the answers to such things as PSI are still elusive. Like many other realms of the paranormal, our species may never know the truth. I am perfectly fine with accepting that.
* * *
Since I have your attention, I saw another interesting example of the media getting ahead of real scientific fact just yesterday. On The Daily (Social Distancing) Show, of all places. In a Snapchat segment of the show (I don’t watch The Daily Show normally), Trever Noah talked about the recent discovery of a parallel universe to our own, wherein time flowed backwards. Maybe flowed is the wrong word (time is not a liquid), so time runs backward?
Either way, it doesn’t really matter because it’s not really true.
Parallel Universe Discovered? No, NASA Hasn’t Found a Universe Where Time Runs Backwards
Is there a parallel universe where time runs backwards? That is what many news reports seem to be suggesting, attributing the “finding” to NASA scientists. While this has certainly made a whole lot of people excited but in reality, this is far from the truth. In fact, scientists have just found evidence of high-energy particles that defy our current understanding of physics and a parallel universe has been suggested only as one of the possible theories to explain them, without any solid evidence in the favour.
It all started after a report by New Scientist about an experiment by astrophysicists came out. Antarctic Impulsive Transient Antenna (ANITA) is a telescope that comprises of radio antennas attached to a giant balloon that hovered over Antarctica at a very high altitude of around 37kms. It is run by a multi-university consortium led by Peter Gorham of the University of Hawaii-Manoa. ANITA was sent so high so that it was able to detect matter like the high-energy particles called “neutrinos” from the space, according to CNET. The telescope can spot these neutrinos coming from the space and hitting the ice sheet in Antarctica. ANITA detected these particles, but instead of coming from the space, the neutrinos were found to be coming from the Earth’s surface without any source. These detections happened in 2016, then again in 2018, but there was no credible explanation.
No clarity on the anomaly
“After four years there has been no satisfactory explanation of the anomalous events seen by ANITA so this is very frustrating, especially to those involved,” CNET said quoting Ron Ekers, an honorary fellow at Australia’s National Science Agency. The recent reports claiming that there is evidence of a parallel universe appear to be based on ANITA findings that are at least a couple of years old.
Another neutrino observatory in Antarctica called IceCube that is run by the University of Wisconsin–Madison conducted an investigation on the ANITA findings and it published a paper in The Astrophysical Journal. The researchers said in January that “other explanations for the anomalous signals – possibly involving exotic physics – need to be considered” because the standard model of physics cannot explain these events.
“‘Exotic physics’ would be where this theory of a parallel universe fits into the conversation. It’s just one of several theories outside of our current understanding of physics that has been floated as a potential cause for this,” reported WUSA9.
What are the possibilities?
IceCube researchers also said that main hypotheses on the strange detections include an astrophysical explanation (like an intense neutrino source), a systematic error (like not accounting for something in the detector), or physics beyond the Standard Model.
“Our analysis ruled out the only remaining Standard Model astrophysical explanation of the anomalous ANITA events. So now, if these events are real and not just due to oddities in the detector, then they could be pointing to physics beyond the Standard Model,” said Alex Pizzuto, one of the leads on the paper published in The Astrophysical Journal.
This means that there are two possibilities—one of which could just be an error. Errors are a part of scientific experiments when researchers try hard to find something new.
Going by what the scientists have actually said, it’s clear that these are exciting times for the astrophysicists trying to find an explanation and future experiments with more “exposure and sensitivity” will be required to get a clear understanding of the anomaly. However, people wishing for a parallel universe will have to wait because the evidence is lacking and the scientists are not ready to call it a discovery.
Are these findings noteworthy and exciting?
For the science-lovers and the astrophysicists of the world . . . yes.
Has humanity discovered a parallel universe wherein time runs in reverse? No.
One thing that comes to mind when pondering this situation that is WAY beyond my scope of expertise . . . how well do these neutrinos (which the equipment over Antarctica picks up, as per the story) travel though the earth itself? Is there such a device also deployed in the arctic to monitor neutrinos coming from that side of space as well?
Either way, this is just another example of why one should be careful when considering media reported science discoveries.