Naturalism is Useful: March 2014

Monday, March 10, 2014

Gorilla video

A poster at Skeptiko recently brought up an article from Dean Radin talking about the disconnect between his perception of the strength of the evidence for psi, and how that evidence seems to be ignored by scientists in general. I wrote a response, not realizing that the article was 5 years old, mostly because the mis-use of the gorilla video is a pet peeve of mine. (By the way, this particular book is invaluable for releasing pet-peeve-tension.) :-)

https://realitysandwich.com/7283/what_gorilla/

Radin:
"Imagine you're watching a basketball game. Your favorite team is wearing white and the other team is in black. In the midst of the action, someone in a dark gorilla suit calmly walks to the center of the court, waves to the crowd, then walks off the court. Do you think you would notice this peculiar event? Most people might say yes. Most people would be wrong."
...
"Because of these blind spots, some common aspects of human experience literally cannot be seen by those who've spent decades embedded within the Western scientific worldview. That worldview, like any set of cultural beliefs inculcated from childhood, acts like the blinders they put on skittish horses to keep them calm. Between the blinders we see with exceptional clarity, but seeing beyond the blinders is not only exceedingly difficult, after a while it's easy to forget that your vision is restricted.

An important class of human experience that these blinders exclude is psychic phenomena, those commonly reported spooky experiences, such as telepathy and clairvoyance, that suggest we are deeply interconnected in ways that transcend the ordinary senses and our everyday notions of space and time."

My response:
"A bit of a nitpick (because the gorilla video experiment has been over-used and abused)...what you describe isn't related to the inattentional blindness demonstrated in the gorilla video. Merely having a preference for one or the other team, while watching a game, does not lead to missing the the gorilla. Most people notice the gorilla under those conditions. You have to give people a different task which fully occupies their attention, if you want them to fail to notice the gorilla. And even then, half the people will still notice the gorilla.

What you are describing, with respect to the perception of psi among scientists in general, is the effect of how our prejudices tend to influence our judgement. In this case, fans of the white team will see the gorilla. But whether they see it as disruptive vs. helpful to their team may depend upon its colour.

I agree that discussions which take place in the media tend to misrepresent what is happening at the scientific level. I propose that the way to draw other scientists into taking the research seriously is to follow the path of evidence-based medicine, with respect to practices which reduce the risk of bias and the production of false-positive results. As it is, research performed under conditions where problems in design, implementation and/or analysis can grossly inflate the number of falsely positive studies, can be easily dismissed as reflecting the result of bias, rather than a true effect."

Linda

Thursday, March 6, 2014

Outcomes, Part 2

Hand in hand with flexibility in outcomes is "selective outcome reporting". Selective outcome reporting refers to failing to report outcomes which should be of interest. It isn't just a matter of having a variety of ways in which the outcome could be measured, and only reporting the results of some. Sometimes authors fail to report the only outcome of interest, or fail to report the details (such as the results of a significance test or the size of the effect).

We had one example in the previous blog post. Robertson and Roy did not report on the result of a comparison between the recipient group and the control group in any of the experimental conditions. In particular, they did not report on the double-blind condition, which was the purported reason for doing the experiment in the first place (according to Robertson and Roy 2001). Instead we were given the results of a complicated set of analyses which broke up the experimental groups and recombined the subjects into new groups. It is reasonable to offer exploratory analyses after the fact, but not at the expense of failing to report on the main outcome.

Another recent example comes from Dean Radin. In his study of blessed tea, he measures the mood in those consuming blessed tea and those who receive tea which is identical except that it has not been blessed, under blind conditions. Yet he makes no mention of the main outcome proposed for the study - was there any difference in mood (in this case, improvement in mood from baseline) between those drinking blessed tea and those drinking tea which hadn't been blessed. When asked for those results, it turns out that there wasn't a significant difference between the control and the intervention group. Yet he presents the study as though it was positive. And as far as I can tell, it is accepted as though it is a positive study by proponents. This is accomplished by selectively reporting on a result which is not a valid and reliable type of outcome (post-hoc sub-group analysis) and substituting it for the actual outcome.

It is to be expected that the author of a study will be most interested in presenting the study in a positive light, and in a way which confirms what they hoped it would confirm. Even in a field where there is a culture of publishing results, regardless of whether they are positive or negative (parapsychology), it's still preferable to be the researcher who publishes positive results. But the more useful and interesting approach is to look at whether the results are likely to be true-positives, rather than false-positives.

Linda

Robertson, T. J. and Roy, A. E. (2004) Results of the application of the Robertson-Roy Protocol to a series of experiments with mediums and participants. JSPR 68.1

Roy, A. E. and Robertson, T. J. (2001) A double-blind procedure for assessing the relevance of a medium’s statements to a recipient. JSPR 65.3

http://deanradin.com/evidence/Shiah2013.pdf

Tuesday, March 4, 2014

Outcomes, Part 1

My (very) informal survey of why other scientists and academics don't generally pay much attention to psi research reveals that they mostly assume that the purported effects are due to some sort of publication bias. That is, the collection of studies presented as though they demonstrate an anomalous effect represents a selected sample of all the research performed - studies with positive results are presented for our attention while those with negative results quietly fade away. There is an element of truth to this assumption, but it's more complicated than what we normally think of as publication bias.

A 'positive' study usually refers to a study which demonstrates a "statistically significant" result. A statistically significant result can represent a true-positive (there is a real effect present and it is responsible for the positive test of statistical significance) or it can represent a false-positive (there is no effect present or the positive result was not the result of any real effect). Jon Ioannidis famously argued that most (or all) of the positive results within a field may be false-positives.

There are a number of ways to boost the number of false-positive studies. One of the easier ways to do so is to use flexibility in outcomes. The false-positive rate which is meant to be 5% or less (the usual level chosen for significance testing is p<0.05 which represents the alpha or Type 1 ("false-positive") error rate) can easily rise to 50% or higher once you violate the assumptions which underlie significance testing by introducing multiple ways to chose an outcome measure.

One way to address this issue is to use only valid outcome measures, and preferably the same one for each type of study. Any time you perform an experiment or study, you have a particular outcome in mind which you are interested in observing. Sometimes this is called the "dependent variable" (as opposed to the "independent variables", which are the characteristics which you think may alter the outcome). Depending upon the circumstances, we may have multiple ways we could measure that outcome - some of which are valid and reliable, and some which are not. A valid measure is one which actually captures the outcome the interest. If you want to know how tall someone is, measuring their length with a ruler would be a valid measure, while measuring their weight would not be. Sometimes it is obvious whether or not a measure is valid, but often, it is not. For example, how would you measure how "big" someone is? A reliable measure is one which gives the same result no matter whether someone else does the measuring (inter-rater agreement) or whether the same person does the measuring at different times (intra-rater agreement).

Establishing a valid outcome measure is often difficult. The example I am going to use is mediumship research, as this topic came up recently on the Skeptiko forum (http://www.skeptiko.com/forum/threads/why-skeptics-are-wrong-podcast.568/page-4#post-15416).

Mediums receive visual, auditory, and other sensations (e.g. scent or emotion) which they interpret as coming from a connection to a decreased person ("discarnate"). A mediumship reading usually involves three components - identifying the discarnate for a recipient (the living person who receives the reading), verifying that the source of information is the discarnate (usually by offering information which is regarded as accurate and specific to the recipient), and conveying messages from the discarnate to the recipient. Research generally focuses on the verification component, since it is this aspect which speaks to the idea of psi and survival of consciousness. So what would be a valid way to measure "specific and accurate information has been received from a discarnate"? If we look at how the question has been answered in the most rigorous of the mediumship studies (Robertson and Roy, Beischel and Schwartz, Kelly and Arcangel), we find that we have 21 different answers in just 3 studies.

Robertson and Roy measured the dependent variables by breaking down each reading into individual statements and then recording:
- the number of statements accepted as true by each participant
- the number of participants who accepted a given statement
- the total number of statements

Beischel and Schwartz recorded:
- the placement of each general statement into 1 of 5 accuracy categories
- the placement of each of 4 Life Questions into 1 of 5 accuracy categories
- the placement of each of the Reverse Questions into 1 of 5 accuracy categories
- the placement of each general statement into 1 of 4 emotional categories
- the placement of each of 4 Life Questions into 1 of 4 emotional categories
- the placement of each of the Reverse Questions into 1 of 4 emotional categories
- a written explanation of each general statement placed into 1 of 2 accuracy categories
- a written explanation of each of 4 Life Questions placed into 1 of 2 accuracy categories
- a written explanation of each of the Reverse Questions placed into 1 of 2 accuracy categories
- a global numerical score for each reading on a 7-point scale
- the choice of 1 of 2 readings
- a rating of that choice on a 5-point scale

Kelly and Arcangel recorded:
Study 1
- the accuracy of each statement on a 5-point scale
- the significance of each statement on a 5-point scale
- the choice of 1 of 4 readings
Study 2
- the accuracy of each reading on a 10-point scale
- the rank of each reading within a group of 6, based on the scores (there were ties, including ties for first place)
- the choice of 1 of 6 readings
- written comments on each of the readings

What is striking about this list is not just the sheer number of different outcomes, but that among the three studies, no two are the same. Even the way in which the accuracy of individual statements is measured is different in each study. These outcome measures cannot all be valid (they don't come to the same answers). So then it becomes important to ask whether any are valid, and if so, which ones? The list also highlights that concerns about a grossly inflated false-positive rate are legitimate.

It was suggested that all which was needed to perform more rigorous mediumship research was to repeat the Beischel study with a larger sample size. However, this is almost guaranteed to lead to a false positive result until it is determined which one of the 12 different outcome measures is most valid.

Linda

http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.0020124

http://pss.sagepub.com/content/22/11/1359.full

http://deanradin.com/evidence/Beischel2007.pdf
http://deanradin.com/evidence/Kelly2011.pdf

Robertson, T. J. and Roy, A. E. (2004) Results of the application of the Robertson-Roy Protocol to a series of experiments with mediums and participants. JSPR 68.1

Sunday, March 2, 2014

Why "Naturalism is Useful"?

One of the biggest concerns I run across in these internet discussions is the proposal that the practice of science operates under various metaphysical assumptions or axioms - materialism, reductionism, causality, repeatability, philosophical naturalism, etc. It is then argued that the body of knowledge reflects these underlying assumptions, rather than some sort of independent truth about the nature of reality. My perspective is different.

I doubt most scientists and science types give much thought to philosophy or to adopting these assumptions. I'm not a materialist or a reductionist or a philosophical naturalist. At best, the assumption which underlies the practice of science is "(methodological) naturalism is useful". Materialism, physicalism, reductionism, holism, causality, consistency, etc. are merely the result of our observations, not the cause.

"Methodological naturalism" means that knowledge is built from reference to events and experiences. "Useful" means that this knowledge is progressive, it distinguishes between ideas which are true or false, it allows us to make predictions, it tightly constrains the possibilities, it generates novel information and observations. Most claims which are regarded as supernatural or paranormal are actually ordinary naturalism claims, as they are generated by reference to experiences or events.

From this perspective, there isn't anything which makes an experience or event "supernatural" or "paranormal" beforehand. Nor are we prevented from including paranormal claims under the practice of science. The title of this blog is just an indication that discussions about "reductionist, materialist science" aren't relevant or of interest.

Linda