Monday, April 7, 2014

Shifting goalposts?

I commented on another blog about a study which had some similarities to the Sheldrake staring experiments.

http://thinkingdeeper.wordpress.com/2014/03/09/sheldrake-vs-ubc-the-same-experiment/

http://hct.ece.ubc.ca/publications/pdf/gauchou-rensink-cac2012.pdf

http://www.sheldrake.org/files/pdfs/papers/sensoryclues.pdf

While I was reading the UBC paper, I was aware that I felt less critical about the paper than I would be if it had been a parapsychology paper.  Considering Dean Radin's criticisms from my previous blog post, and my criticisms of Radin's presentation of the blessed tea study, is it fair for me to be any less critical of the UBC paper (or alternatively, more critical of parapsychology papers)?  After all, like Sheldrake's and Radin's papers, there were multiple ways offered to analyze the results, the findings were post hoc, and novel outcome measures were offered.

Or were they?

An important design choice in the UBC paper is highlighted by contrasting it with Radin's paper.  Radin gave two groups of people tea which had been blessed or not, and measured change in mood and the subject's belief that they were in the intervention group.  The authors of the UBC study asked people general knowledge questions explicitly and implicitly (through the use of a Ouija board), and measured accuracy and the subject's belief that they were guessing at the answers.  In both cases, the significant finding was an interaction between the intervention and the belief condition.  Amongst those who believed they received the blessed tea, those who actually received the blessed tea had more improvement than those who did not.  Amongst those who believed they were guessing, those who were asked general knowledge questions implicitly (via the Ouija board) performed more accurately than when asked those questions explicitly.

Why are Radin's findings likely false, while the UBC study findings may be true?  The biggest difference is that Radin's findings are post hoc, while those in the UBC study were pre-planned.  Post-hoc testing violates the assumptions which underlie statistical significance testing, which reduces the validity of the results.

How can we tell whether a finding is pre-planned vs. post hoc?  It is not sufficient for the researcher to state a comparison was pre-planned.  And merely choosing to measure a number of different variables does not qualify as pre-planning.  So we can look at other factors, such as experimental manipulation, descriptions of the planning, and the analysis of the results.

The UBC group deliberately manipulated the belief condition by selecting questions which the subject identified as guesses.  They were identified as "guesses" independently of the accuracy of the answer and independently of their use in the "Ouija" board condition.  This experimental manipulation must be pre-planned.  There was no equivalent in Radin's study.  To be equivalent, Radin would also need to manipulate the belief condition (in this case, by manipulating what information was given to the subjects).  Unlike the UBC study, "belief" was a dependent variable in Radin's study, so it wouldn't be possible to form groups on the basis of "belief" prior to the drinking of the tea.

Another way to tell whether a comparison was pre-planned is to look at which comparisons were used in the sample size calculations (if reported).  In the UBC study, there are no sample size calculations reported.  In Radin's study, he reports that the sample size was assumed to be adequate based on his intentional chocolate study.  In that study, mood level (not change in mood) on each day was compared between conditions and "belief" was not a reported variable.  Had "belief" been a pre-planned condition in the tea study, it should have been accounted for, in some way, in the sample size assessments.

Finally, a quick way to check whether a comparison was pre-planned is to look at whether all the subjects are included in the analysis and whether the reasons for any exclusions are independent of the outcome.  In the UBC study, the analysis included 21/27 of the subjects who participated in the study.  Exclusions were based on a lack of success (i.e. movement of the planchette without conscious interference) in the use of the Ouija board and were unrelated to the outcome.  Radin included 40% of the subjects in his analysis, excluding more than half of the participants.  Thirty-two out of 221 were dropped for reasons unrelated to the outcome.  The remainder (101/221) were dropped for reasons which were strongly related to the outcome.  It would be very unlikely that a researcher would pre-plan a comparison which would so dramatically violate the significance testing.

To be fair, there is a good chance that the UBC study results are also false.  The sample size was small and it was somewhat exploratory, even if it was well-designed in comparison to Radin's study.  It will be interesting to see whether the findings hold up under attempted replications.

Linda

Monday, March 10, 2014

Gorilla video

A poster at Skeptiko recently brought up an article from Dean Radin talking about the disconnect between his perception of the strength of the evidence for psi, and how that evidence seems to be ignored by scientists in general.  I wrote a response, not realizing that the article was 5 years old, mostly because the mis-use of the gorilla video is a pet peeve of mine.  (By the way, this particular book is invaluable for releasing pet-peeve-tension.) :-)

https://realitysandwich.com/7283/what_gorilla/

Radin:
"Imagine you're watching a basketball game. Your favorite team is wearing white and the other team is in black. In the midst of the action, someone in a dark gorilla suit calmly walks to the center of the court, waves to the crowd, then walks off the court. Do you think you would notice this peculiar event? Most people might say yes. Most people would be wrong."
...
"Because of these blind spots, some common aspects of human experience literally cannot be seen by those who've spent decades embedded within the Western scientific worldview. That worldview, like any set of cultural beliefs inculcated from childhood, acts like the blinders they put on skittish horses to keep them calm. Between the blinders we see with exceptional clarity, but seeing beyond the blinders is not only exceedingly difficult, after a while it's easy to forget that your vision is restricted.
An important class of human experience that these blinders exclude is psychic phenomena, those commonly reported spooky experiences, such as telepathy and clairvoyance, that suggest we are deeply interconnected in ways that transcend the ordinary senses and our everyday notions of space and time."
My response:
"A bit of a nitpick (because the gorilla video experiment has been over-used and abused)...what you describe isn't related to the inattentional blindness demonstrated in the gorilla video.  Merely having a preference for one or the other team, while watching a game, does not lead to missing the the gorilla.  Most people notice the gorilla under those conditions.  You have to give people a different task which fully occupies their attention, if you want them to fail to notice the gorilla.  And even then,  half the people will still notice the gorilla.

What you are describing, with respect to the perception of psi among scientists in general, is the effect of how our prejudices tend to influence our judgement.  In this case, fans of the white team will see the gorilla.  But whether they see it as disruptive vs. helpful to their team may depend upon its colour.

I agree that discussions which take place in the media tend to misrepresent what is happening at the scientific level.  I propose that the way to draw other scientists into taking the research seriously is to follow the path of evidence-based medicine, with respect to practices which reduce the risk of bias and the production of false-positive results.  As it is, research performed under conditions where problems in design, implementation and/or analysis can grossly inflate the number of falsely positive studies, can be easily dismissed as reflecting the result of bias, rather than a true effect."

Linda

Thursday, March 6, 2014

Outcomes, Part 2

Hand in hand with flexibility in outcomes is "selective outcome reporting".  Selective outcome reporting refers to failing to report outcomes which should be of interest.  It isn't just a matter of having a variety of ways in which the outcome could be measured, and only reporting the results of some.  Sometimes authors fail to report the only outcome of interest, or fail to report the details (such as the results of a significance test or the size of the effect).

We had one example in the previous blog post.  Robertson and Roy did not report on the result of a comparison between the recipient group and the control group in any of the experimental conditions.  In particular, they did not report on the double-blind condition, which was the purported reason for doing the experiment in the first place (according to Robertson and Roy 2001).  Instead we were given the results of a complicated set of analyses which broke up the experimental groups and recombined the subjects into new groups.  It is reasonable to offer exploratory analyses after the fact, but not at the expense of failing to report on the main outcome.  

Another recent example comes from Dean Radin.  In his study of blessed tea, he measures the mood in those consuming blessed tea and those who receive tea which is identical except that it has not been blessed, under blind conditions.  Yet he makes no mention of the main outcome proposed for the study - was there any difference in mood (in this case, improvement in mood from baseline) between those drinking blessed tea and those drinking tea which hadn't been blessed.  When asked for those results, it turns out that there wasn't a significant difference between the control and the intervention group.  Yet he presents the study as though it was positive.  And as far as I can tell, it is accepted as though it is a positive study by proponents.  This is accomplished by selectively reporting on a result which is not a valid and reliable type of outcome (post-hoc sub-group analysis) and substituting it for the actual outcome.

It is to be expected that the author of a study will be most interested in presenting the study in a positive light, and in a way which confirms what they hoped it would confirm.  Even in a field where there is a culture of publishing results, regardless of whether they are positive or negative (parapsychology), it's still preferable to be the researcher who publishes positive results.  But the more useful and interesting approach is to look at whether the results are likely to be true-positives, rather than false-positives.

Linda

Robertson, T. J. and Roy, A. E. (2004) Results of the application of the Robertson-Roy Protocol to a series of experiments with mediums and participants. JSPR 68.1
Roy, A. E. and Robertson, T. J. (2001) A double-blind procedure for assessing the relevance of a medium’s statements to a recipient. JSPR 65.3

http://deanradin.com/evidence/Shiah2013.pdf


Tuesday, March 4, 2014

Outcomes, Part 1

My (very) informal survey of why other scientists and academics don't generally pay much attention to psi research reveals that they mostly assume that the purported effects are due to some sort of publication bias.  That is, the collection of studies presented as though they demonstrate an anomalous effect represents a selected sample of all the research performed - studies with positive results are presented for our attention while those with negative results quietly fade away.  There is an element of truth to this assumption, but it's more complicated than what we normally think of as publication bias.

A 'positive' study usually refers to a study which demonstrates a "statistically significant" result.  A statistically significant result can represent a true-positive (there is a real effect present and it is responsible for the positive test of statistical significance) or it can represent a false-positive (there is no effect present or the positive result was not the result of any real effect).  Jon Ioannidis famously argued that most (or all) of the positive results within a field may be false-positives.

There are a number of ways to boost the number of false-positive studies.  One of the easier ways to do so is to use flexibility in outcomes.  The false-positive rate which is meant to be 5% or less (the usual level chosen for significance testing is p<0.05 which represents the alpha or Type 1 ("false-positive") error rate) can easily rise to 50% or higher once you violate the assumptions which underlie significance testing by introducing multiple ways to chose an outcome measure.

One way to address this issue is to use only valid outcome measures, and preferably the same one for each type of study.  Any time you perform an experiment or study, you have a particular outcome in mind which you are interested in observing.  Sometimes this is called the "dependent variable" (as opposed to the "independent variables", which are the characteristics which you think may alter the outcome).  Depending upon the circumstances, we may have multiple ways we could measure that outcome - some of which are valid and reliable, and some which are not.  A valid measure is one which actually captures the outcome the interest.  If you want to know how tall someone is, measuring their length with a ruler would be a valid measure, while measuring their weight would not be.  Sometimes it is obvious whether or not a measure is valid, but often, it is not.  For example, how would you measure how "big" someone is?  A reliable measure is one which gives the same result no matter whether someone else does the measuring (inter-rater agreement) or whether the same person does the measuring at different times (intra-rater agreement).

Establishing a valid outcome measure is often difficult.  The example I am going to use is mediumship research, as this topic came up recently on the Skeptiko forum (http://www.skeptiko.com/forum/threads/why-skeptics-are-wrong-podcast.568/page-4#post-15416).

Mediums receive visual, auditory, and other sensations (e.g. scent or emotion) which they interpret as coming from a connection to a decreased person ("discarnate").  A mediumship reading usually involves three components - identifying the discarnate for a recipient (the living person who receives the reading), verifying that the source of information is the discarnate (usually by offering information which is regarded as accurate and specific to the recipient), and conveying messages from the discarnate to the recipient.  Research generally focuses on the verification component, since it is this aspect which speaks to the idea of psi and survival of consciousness.  So what would be a valid way to measure "specific and accurate information has been received from a discarnate"?  If we look at how the question has been answered in the most rigorous of the mediumship studies (Robertson and Roy, Beischel and Schwartz, Kelly and Arcangel), we find that we have 21 different answers in just 3 studies.

Robertson and Roy measured the dependent variables by breaking down each reading into individual statements and then recording:
- the number of statements accepted as true by each participant
- the number of participants who accepted a given statement
- the total number of statements

Beischel and Schwartz recorded:
- the placement of each general statement into 1 of 5 accuracy categories
- the placement of each of 4 Life Questions into 1 of 5 accuracy categories
- the placement of each of the Reverse Questions into 1 of 5 accuracy categories
- the placement of each general statement into 1 of 4 emotional categories
- the placement of each of 4 Life Questions into 1 of 4 emotional categories
- the placement of each of the Reverse Questions into 1 of 4 emotional categories
- a written explanation of each general statement placed into 1 of 2 accuracy categories
- a written explanation of each of 4 Life Questions placed into 1 of 2 accuracy categories
- a written explanation of each of the Reverse Questions placed into 1 of 2 accuracy categories
- a global numerical score for each reading on a 7-point scale
- the choice of 1 of 2 readings
- a rating of that choice on a 5-point scale

Kelly and Arcangel recorded:
Study 1
- the accuracy of each statement on a 5-point scale
- the significance of each statement on a 5-point scale
- the choice of 1 of 4 readings
Study 2
- the accuracy of each reading on a 10-point scale
- the rank of each reading within a group of 6, based on the scores (there were ties, including ties for first place)
- the choice of 1 of 6 readings
- written comments on each of the readings

What is striking about this list is not just the sheer number of different outcomes, but that among the three studies, no two are the same.  Even the way in which the accuracy of individual statements is measured is different in each study.  These outcome measures cannot all be valid (they don't come to the same answers).  So then it becomes important to ask whether any are valid, and if so, which ones? The list also highlights that concerns about a grossly inflated false-positive rate are legitimate.

It was suggested that all which was needed to perform more rigorous mediumship research was to repeat the Beischel study with a larger sample size.  However, this is almost guaranteed to lead to a false positive result until it is determined which one of the 12 different outcome measures is most valid.

Linda


http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.0020124

http://pss.sagepub.com/content/22/11/1359.full

http://deanradin.com/evidence/Beischel2007.pdf
http://deanradin.com/evidence/Kelly2011.pdf

Robertson, T. J. and Roy, A. E. (2004) Results of the application of the Robertson-Roy Protocol to a series of experiments with mediums and participants. JSPR 68.1

Sunday, March 2, 2014

Why "Naturalism is Useful"?

One of the biggest concerns I run across in these internet discussions is the proposal that the practice of science operates under various metaphysical assumptions or axioms - materialism, reductionism, causality, repeatability, philosophical naturalism, etc.  It is then argued that the body of knowledge reflects these underlying assumptions, rather than some sort of independent truth about the nature of reality.  My perspective is different.

I doubt most scientists and science types give much thought to philosophy or to adopting these assumptions.  I'm not a materialist or a reductionist or a philosophical naturalist.  At best, the assumption which underlies the practice of science is "(methodological) naturalism is useful".  Materialism, physicalism, reductionism, holism, causality, consistency, etc. are merely the result of our observations, not the cause.

"Methodological naturalism" means that knowledge is built from reference to events and experiences.  "Useful" means that this knowledge is progressive, it distinguishes between ideas which are true or false, it allows us to make predictions, it tightly constrains the possibilities, it generates novel information and observations.  Most claims which are regarded as supernatural or paranormal are actually ordinary naturalism claims, as they are generated by reference to experiences or events.

From this perspective, there isn't anything which makes an experience or event "supernatural" or "paranormal" beforehand.  Nor are we prevented from including paranormal claims under the practice of science.  The title of this blog is just an indication that discussions about "reductionist, materialist science" aren't relevant or of interest.

Linda

Friday, February 28, 2014

Low-hanging Fruit

I have long been interested in ideas which hang around the edges of science.  If you are looking for discussion of these ideas, you fairly quickly come across two camps which tend to dominate the debates - the Skeptics and the Believers.  I hung around for several years on one of the more active skeptical sites, the James Randi Educational Foundation (JREF) forum.  My username there was "fls".  A few years ago I moved over to one of the more active believer sites, the Skeptiko forum, where I kept the same username.

Both kinds of sites are dominated by attacks on the low-hanging fruit, those ideas and behaviours which are fairly easily derided as silly - psychics like Sylvia Browne who seem to be phoning it in, critics of psi who proudly declare they haven't read the paper they are criticizing.  And both kinds of sites seemed devoted to polarizing the debate.  Psychics are all deluded or frauds.  Skeptics are close-minded sheep.  And on it goes.  Alex Tskaris, the founder of Skeptiko, coined the phrase "Stuck on Stupid" to refer to anyone who doesn't share his beliefs.  James Randi coined "Woo-Woos" to refer to those who believe in the paranormal.

Many people are certainly profiting (some in terms of cash, many in terms of credit) from making this all about a culture war.  But what gets lost in all this is that there are some damned interesting aspects to discuss in terms of evidence, the way in which science is practised, and why it matters.  Along the way, I have found a few people interested in engaging with these topics - thoughtful, funny, knowledgeable people who are a pleasure to talk to.  But these discussions seem to get marginalized and over-run by those who need to turn this into an Us vs.  Them culture war.  I've given up (at least for the moment) looking for a place where proponents and non-proponents engage in reasonable discussion.

I've started this blog in order to talk about evidence and speculation in the presence of evidence rather than its absence, moving beyond the low-hanging fruit.  My perspective comes from many years working with and teaching evidence-based practices in medicine, and I apply the same approach when looking at those ideas and phenomena which fall under "paranormal" as I do to those ideas and phenomena which fall under "health".  I am hoping others will join me here, although I have an unfortunate tendency to enjoy talking to myself. :-)

Linda