A Research Study into the Interpretation of EVP

Part I

by Mark Leary, Ph.D.
Published in the Winter 2013 ATransC NewsJournal
Read Part 2 and Part 3


Anyone who has listened to even a few EVP recordings knows how difficult they are to interpret. Listeners often disagree, sometimes strongly, regarding what a particular EVP seems to say, which raises questions about the validity of each person’s interpretation. Yet, the usefulness of EVP depends on the degree to which investigators can trust one another’s interpretations of the EVP that they record. Although a great deal has been written about the possible mechanisms that produce EVP and the types of equipment that are most effective in recording them, EVP enthusiasts have devoted far less attention to problems associated with interpreting the sounds that are recorded.

After observing repeated disagreements among investigators (and rarely feeling that the interpretations of EVP on paranormal television shows match what I hear), I undertook a study to examine how serious the problem really is. The study that I conducted had two main goals: to document the degree to which investigators agree or disagree on their interpretations of EVP and to create a means of identifying which interpretation of a particular EVP is most likely to be “correct.”

The Study

To obtain a set of EVP for analysis, I contacted a number of paranormal investigators who had conducted systematic investigations at the Ferry Plantation House in Virginia Beach, Virginia. I received over 250 EVP, from which I chose 94 that were among the clearest in terms of having obvious vocal characteristics. These recordings came from eleven investigators who recorded them across seven different investigations. In general, investigators seemed to submit what they viewed as particularly good EVP, all of them recorded without a background noise source.

I then recruited 24 individuals (10 men, 14 women) with paranormal investigation experience to listen to and interpret the 94 audio clips. The raters ranged in age from 29 to 62, with an average age of 46. All but two of them currently belonged to active paranormal investigation groups.

The raters were sent a CD with the audio clips, along with a form for interpreting the EVP and a background information questionnaire. Raters listened to each EVP as many times as needed, wrote down each word that they heard (putting an asterisk for any words they could not understand), indicated any emotion that they detected in the voice, and rated their confidence that their interpretation of the EVP was correct. The background questionnaire asked about raters’ age and sex, their interests and beliefs in the paranormal, and included a brief measure of basic personality dimensions (such as extraversion, emotional stability and agreeableness).

Determining Agreement

Although we can never know for sure what an EVP “really” says, my analysis of raters’ interpretations was based on the assumption that a particular interpretation of an EVP that is made independently by several people is more likely to be “correct” than an interpretation that is made by only a few individuals. For example, if seven out of ten people who listen to an EVP hear exactly the same words, two other individuals hear a different set of words and the remaining person hears something else entirely, the interpretation on which the seven people agreed would be more likely to reflect the actual sounds than the other individuals’ idiosyncratic interpretations.

Thus, to begin, I determined a “consensus interpretation” for each EVP by counting the number of times that raters reported hearing various words. For example, whatever first word was heard by the most raters became the first word of the consensus interpretation. Whatever second word was heard by most raters was the second word of the consensus interpretation, and so on. In this way, I came up with the most common (or consensus) interpretation for each EVP.

With the consensus interpretation in hand, I then calculated the percentage of raters who agreed with the consensus interpretation. This number could range from 0% (no two raters reported hearing the same thing) to 100% (all raters agreed with the consensus interpretation) and is an index of the degree to which raters independently agreed in their interpretations of each EVP.

Of the 94 EVP, the one with the highest agreement (“What’s going on?”) was listed by 83% of the raters. That is, 83% of the raters listed the consensus interpretation for this EVP. However, the overall agreement for the entire set of EVP was much lower. Across all 94 EVP, average agreement with the consensus interpretation was only 21%. In other words, only about 1 out of 5 raters gave an interpretation that agreed with the most common (and, presumably most “accurate”) interpretation.

When analyzed at the level of particular words rather than the entire EVP, average agreement was 35%. Raters agreed with the most common interpretation of each specific word on about 1 out of every 3 words on average.

Some of the EVP not only had 0% agreement, but the various interpretations sometimes differed wildly. For example, one EVP that had no agreement on any words across raters was interpreted as saying, among other things: “Deep inside there’s a pickup;” “Keep those hidden Mr. Gel;” “He comes out here;” “Go outside and just lean on it;” “Get it tight, got to stretch it;” “Don’t try to persuade them;” “Get us out Mr. Kant;” and “I need the guns out if this is what you’ll do.” These various interpretations do not even contain similar phonemes.

Incidentally, the percentage of agreement with the consensual interpretation can be used as a way of assessing the clarity of an EVP. Historically, investigators have classified EVP as Class A, B, or C depending on how easily listeners can hear a message. But calculating the percentage of people who independently agree with the most common interpretation is a more precise and unambiguous indicator of the quality and clarity of an EVP than classifying it into one of three categories. Every EVP would have a score from 0 (no consensus; this EVP cannot be interpreted) to 100 (complete consensus; this EVP is so clear that everyone hears exactly the same thing).

Emotional Content

Raters indicated whether they detected any emotion in the voice. The majority of the EVP (63.5%) had no discernible emotional tone. However, raters indicated that some EVP expressed sadness (9.7%), anger or irritability (8.2%), urgency (7.7%), or happiness (6.3%).

Setting aside the fact that most of the EVP had no emotional tone, when an emotion was detected, on average only 12.7% of the raters agreed that a particular emotion, such as anger or sadness, was present. Thus, raters showed even less agreement in detecting emotion than in interpreting the content of the EVP.

Interestingly, raters’ tendency to hear emotions in the EVP was related to their own personalities. For example, raters who scored higher on the measure of extraversion reported “happiness” in the voices more frequently, raters who scored higher on the measure of agreeableness reported hearing both more “happiness” and more “anger,” and those who scored higher on emotional stability heard more “happiness” expressed. Raters’ interpretations of emotional tone sometimes reflected their own personalities as much as the actual features of the EVP.

Rater Confidence

For each EVP, raters indicated how confident they were that their interpretation was correct on a 4-point scale (where 1 = not at all, 2 = a little, 3 = moderately, and 4 = very confident). Across all EVP, raters’ confidence averaged between “a little” and “moderately” confident (average confidence was 2.5 on the 4-point scale). To see if raters who were more confident of their interpretations were more likely to hear what other raters heard (the consensus interpretation), I correlated raters’ confidence judgments with the number of their interpretations that agreed with the group’s consensus interpretation. The correlation was rather weak, indicating that being confident that one’s interpretation is correct does not usually reflect that other people will hear the same thing.

Differences Among Raters

I calculated an index of personal agreement that tells us how good each rater was at hearing the most common interpretation. Individual raters agreed with the group consensus between 17% and 35% of the time, with an average of 22%. That is, the “best” rater agreed with the group consensus interpretations on 35% of the EVP, and the “worst” rater agreed on 17% of the EVP. When analyzed at the level of the word rather than the entire EVP, the percent of raters who agreed with the group consensus varied from 31% to 51%, with an average of 38% of the words. So, if we play the average EVP to a large group of people, the average person will agree with the consensus interpretation of the entire EVP 22% of the time but agree with 38% of the words.

I analyzed whether any of the characteristics of the raters mattered in their agreement with the consensus interpretation. Although we might expect that experience with EVP might be related to interpretation ability, the degree to which raters agreed with the consensus interpretation was not related to the number of EVP that they had personally recorded, their years of involvement in paranormal investigations, the number or content of paranormal television shows they watched, basic personality dimensions, their age, or the nature of their beliefs in the paranormal. The only variable that was significantly related to agreement with the consensus interpretation was gender. Women’s interpretations agreed with the consensus interpretation 4% more often than men’s interpretations (24% vs. 20%). I’m not sure what to make of this finding.

Most raters’ interpretations were meaningful phrases, but some gave phonetic interpretations even if they did not make semantic sense. For example, on one EVP for which there was no consensus, some raters gave meaningful interpretations (such as “Hey we sung in the chorus” or “That is so great, Cory”), whereas other raters wrote down what they heard even though it didn’t make sense (such as “Hack me some green course” and “Hey peace and grin Coreys”). Investigators should consider whether imposing meaning on an EVP may lead them to “hear” words that help the phrase make sense but that might be incorrect.

The raters also differed in their willingness to leave blanks. Raters were told to use an asterisk when they couldn’t interpret a particular word. Some raters used asterisks regularly, but others did not use them at all. Given that we can assume that no rater was perfectly confident of every word, those who interpreted words they didn’t understand probably made more misleading interpretations than those who admitted that they didn’t understand certain words.

Conclusions and Recommendations

The results of this study suggest that investigators should be less confidence in their interpretations of EVP than they typically are. On average, the most common interpretation of each EVP was shared by only 22% of other people. And, of course, all interpretations other than the most common, consensual one had even less agreement. In fact, most of the raters’ interpretations were not given by any other listener! Furthermore, raters were not particularly good at judging the correctness of their interpretations. Thus, having the sense that “I’m sure this is what it says” does not indicate that other people will agree with one’s interpretation (or that it is actually correct).

These results lead me to offer four recommendations for the responsible interpretation of EVP:

  1. In light of the fact that any particular investigator’s interpretation of an EVP is not likely to be shared by other people and that people’s interpretations are biased by what they expect to hear, investigators should never interpret an EVP for other people without playing it for them several times and soliciting their independent interpretations.
  2. If the interpretation of a specific EVP is particularly important (such as when it is being interpreted for grieving family members), investigators should use a scaled-down version of the procedure used in this study. Have at least 10 people independently listen to the EVP and determine the consensus interpretation, if any. Then report an interpretation of the EVP to others only if a majority of listeners agrees on that interpretation. In some cases, it may be helpful to report more than one potential interpretation, along with the percentage of people who agreed with each one. Providing listeners with such data is a more honest and responsible way to share EVP than to offer a particular interpretation that might, in fact, be idiosyncratic.
  3. Investigators should be willing to refrain from interpreting ambiguous EVP. Providing a questionable interpretation as if it is certain is misleading, if not sometimes dishonest. Just because an EVP cannot be interpreted does not mean it is not a useful piece of evidence, so investigators should not interpret EVP that are unclear.
  4. Paranormal investigation groups and EVP practitioners should have formal guidelines for the interpretation of EVP that minimize the likelihood that they will offer interpretations of EVP—whether to other group members, clients, or outsiders—that are expressed with greater confidence than the objective evidence warrants. Investigators should exercise greater care in sharing their interpretations of EVP, and procedures should be in place to ensure that clients, other investigators and the public are not inadvertently misled regarding interpretations of an EVP.

Read Part 2

learythumbnailsmallWith a Ph. D. in social psychology, Dr. Leary is a research psychologist who studies topics related to self-awareness, motivation, and emotion. He has conducted research on topics such as reactions to social rejection, the effects of excessive self-attention, people’s concerns with their social images, and the relationship between personality and behavior. He is on the editorial boards of several scientific journals in social psychology and recently released a psychology course on DVD entitled “Understanding the Mysteries of Human Behavior.”

Editor’s Note

For an additional study of how people hear EVP, please refer to the article EVP Online Listening Trials in the ATransC online Journal.

“Radio-sweep” is a generic name for EVP thought to be formed using sound produced by sweeping a radio dial. In principle, it produces a form of EVP referred to as “opportunistic EVP.” Please review Locating EVP Formation and Detecting False Positives and Radio-Sweep: A Case Study. Also see the article on page 9: “A Two-Year Investigation of the Allegedly Anomalous Electronic Voices or EVP.”

Leave a Reply