EVPmaker with Allophones: Where are We Now?

by Tom Butler
Previously published in the Fall 2011 ATransC NewsJournal


Based on a number of recent demonstrations by multiple practitioners, ATransC commissioned a study to determine the suitability of that technology for real-time, two-way communication. After three years, a “failure to replicate” style report was published. This article is a discussion of procedural concerns with the study and a discussion of lessons learned which may guide future studies.


Stefan Bion developed a computer program named EVPmaker which uses a random process to select and combine segments of a sound file to produce a new output file. EVP are thought to be produced by the manipulation of the random process. To make the program more controllable for research, Stefan recently provided a sound file containing seventy-two allophones generated with the SpeakJet™ chip-set developed for robotics.

Allophones are small segments of speech, which when combined, can produce “spoken” words. The output from EVPmaker is a steady stream of allophones, which when intentionally selected by the communicating entity, produce EVP messages.

In 2008, Margaret Downey demonstrated real-time conversations using EVPmaker with allophones. An example here. Other practitioners reported similarly meaningful communications using the same technology, giving reason to think the time was right to closely examine real-time communication.

Thanks to a $10,000 donation to the Sarah Estep Research Fund from a member and a second donation from Becky Estep in memory of her mother and founder of the Association, Sarah Estep, ATransC contracted with Windbridge Research Institute to conduct a study. The assumption was that a report from impartial researchers would be more credible than if ATransC members conducted the study. The research question agreed to by ATransC was:

Can the EVPmaker software using the SpeakJet allophones data set produce real-time answers to questions that are posed by an operator under controlled conditions that eliminate conventional explanations for the results?

The project began June 2008 and the resulting report was published in the Summer 2011 Journal of Scientific Exploration. (Article is here) However, the final report to ATransC was delivered October 2009, and from the following comments from the report, it became evident that it was being reported as another “failure to replicate” article:

Taking all of these analyses into account, this study did not find evidence that the EVPmaker software using the SpeakJet allophones data set can produce real-time answers to questions posed by an operator under controlled conditions that eliminate conventional explanations for the results.


The data in this study tend to suggest that the interpretation of EVPmaker conversations is a subjective process, the content of which is meaningful primarily (and perhaps solely) to the operator.

Examining the Windbridge Study

The study took just over three years from start to published report and cost ATransC about $12,000 including overhead. The ATransC objective was to have independent researchers evaluate the technology and help determine the best protocol for replicating the quality of existing examples. The study consisted of four phases: literature search, data collection, data analysis and final report. A single practitioner was used to produce ten sessions containing EVP with transcripts indicating what was heard. Data analysis consisted of allophone frequency analysis, listening panel, message grading as used for mediumship studies and speech recognition software.

Data Collection

It was possible for the practitioner to conduct the EVP sessions at home because of a computer that was configured to provide much the same controls as could be applied in a laboratory. One practitioner was used. The practitioner could do as many sessions as needed and was tasked with selecting and submitting what was heard as the top ten sessions. Besides the recorded sessions and the data file from EVPmaker indicating the sequence of allophones, the practitioner also provided a written script of what was heard as EVP in each session. As agreed to by ATransC, there were no constraints on what the practitioner asked the etheric communicators to evoke an EVP.

The study produced examples which the listening panel agreed on, but the one with the most agreement was discarded as a statistical “outlier” with the comment:

One of the 10 samples—Session 6 (“I’m here for you”)—fell just under the “hit” threshold with a mean of 2.99 (± 0.12). However, it was determined that this value is a statistical outlier* and its removal from the data set should be considered. If the scores given to Session 6 are removed from the analysis, the resulting updated mean for the remaining nine samples falls from 1.15 (± 0.05) to 0.86 (± 0.05). This shows that the perceptions of the listening panel received an average score less than what was deemed a “slight match” to the operator’s perception.

* Convention dictates that values three times the interquartile range above or below the mean be considered outliers.

It is important to note that Class A EVP are, by definition, “outliers.”

Lessons Learned

  1. Open-ended questions make it very difficult to use the “reasonableness” criterion.
  2. Based on an ATransC advisor’s comments, it is essential to use more than one practitioner.
  3. The data-collection methodology used by Windbridge is an excellent approach to establishing research controls for unattended EVP sessions.

Data Analysis

Frequency Analysis

The frequency of occurrence of allophones in the control sessions was compared with the practitioner sessions because (from the final report):

It was hypothesized that if communication involving English words was present in the Active Sessions, certain allophones might be present more or less often than in the Control Sessions

Not knowing what might come of it, we concurred that this was an interesting test. However, we cautioned several times that the words in EVP produced by EVPmaker are often formed in novel ways. As shown below, the researchers also noted this in the Speech Recognition part of the study. If words in the sessions are heard by people even though they may only be phonically similar to the spoken word, it is unlikely that a change in distribution of allophones between control and practitioner sessions would be detectible.

A second factor is that there may be only a few intended words and many naturally occurring words in a session. For the very many allophones generated in a session (1,675 for a three-minute session), would a Class A utterance even show up in such an analysis?

Lessons Learned

Without more study of this technique, it is very difficult to know if the right assumptions have been made by the researchers. From our assessment, it appears to be unreasonable to say that frequency analysis is a realistic technique for detecting the presence of anomalous influence on the selection of allophones.

Listening Panel

An online listening panel was selected and presented ten sound clips from control sessions and ten from the practitioner sessions. An important point in this test was that the examples used from the practitioner sessions were ten of those EVP reported as being heard real time.

One of the questions asked was whether or not the listener heard words in the samples. An average 73% answered “Yes” for the practitioner sessions and 63% answered “Yes” for the control sessions. Roughly half-heard words in each of the twenty examples they were asked to judge.

The grading system the researchers used has potential for future research, especially the way they graded what listeners reported hearing. However, one word responses were counted, including such words “I,” “yes” and “for.” EVPmaker output includes numerous naturally occurring sounds resembling common one-syllable words. This is apparently the case with the control sessions, resulting in both groups having a similar number of reported words.

Lessons learned

Witness panels do work, but one protocol does not fit all forms of EVP. Word-like sounds naturally occur in EVPmaker output, making it necessary to use grading rules which will ignore one-syllable words. EVP is considered communication, and a second consideration is the reasonableness of a response. For instance, a stand-alone word like “oracle” should be ignored unless the practitioner has specifically asked questions for which it is appropriate. One cannot say the word is present if a listening panel does not agree, but since short words are sometimes spontaneously formed by EVPmaker, care must be taken not to include them in the analysis. A methodology would need to be established for determining which is the case.

Judging Content of Reported EVP

As they do for messages in mediumship, the researchers scored the reported EVP with what the practitioner asked or said and reported that:

Of the 124 responses, roughly one-third (31%, 38) received a score of 0 [No fit]. Similarly, another third (34%, 42) received a score of 3 [Obvious fit]. The remaining third of the responses (35%) received median scores of 1 [Fit with minimal interpretation] (20) or 2 [Fit with more than minimal interpretation] (24). The overall mean was 1.56 ± 0.11, a score at the middle of the scoring range, and the higher end of the 95% confidence interval fell below 1.8.

Based on the distribution of these scores, it was concluded that responses perceived by the operator did not consistently contain information that logically matched her questions.

Of course, there remains the fact that nearly a third of the responses did agree with the practitioner. The conclusions arrived at by the researchers beg the question, “How can a 31% agreement be discarded when one is speaking of something that is not supposed to exist?”

Lessons learned: Content judging appears to be a good way to establish a numerical value to the objectivity of a reported utterance. That is essentially what analysis of results from a listening panel is supposed to provide. The rules of “convincingly objective,” however, should be based on reasonable consensus.

Speech Recognition Program

The researchers “trained” a speech recognition program to understand phrases spoken with the SpeakJet allophones. They then attempted to use that program to find the reported EVP phrases. From the report:

It is evident from this comparison that these 10 phrases that the operator heard during the real-time EVPmaker Active Sessions were not present in the EVPmaker output at those times in the sessions. However, similar vowel sounds were often found in the output. For example, when the operator heard the phrase “you are here,” the allophones being “spoken” by EVPmaker actually “said” something like “ooch k hoe are teer.” Similarly, when the operator heard “I’m here for you,” EVPmaker was “saying” “I oo we’re kk door you.”

Here is the example which was discarded as an outlier.

Reported phrase: I’m here for you.
Allophones from EVPmaker: \OHIY \UW \WW \IYRR \KO \EK \DO \OWRR \IYUW
Associated phonetic sounds: (“I oo we’re kk door yoo”)

The computer program was trained to find words in allophones “properly” arranged to form those words. It is difficult to “hear” what this sounds like by reading the phonetic sounds above. They were heard by the practitioner and many of the listening panel as “I’m here for you,” This is an example of how allophones might be arranged to approximate the intended words. Words that would be understood by a human but not found by the program.

Speech recognition programs have been tried for EVP many times, but to our knowledge, with no meaningful success. We made this clear to the researchers, but they insisted they could make it work. Trying to keep an open mind, we agreed. In fact, they did not make it work and we believe this part of the analysis should have been discarded as a bad idea.

Lessons learned: At this time, speech recognition is not a realistic tool for EVP formed with EVPmaker. It may be useful for transform EVP since forensic voice analysis has been successfully used to compare “living” and discarnate voices.


The Journal of Scientific Exploration* is a peer-reviewed publication which has published two other “failure to replicate EVP” -type articles. Based on this and our attempts to communicate with the society, we do not count it as a friend of EVP/ITC. We have no visibility as to who the “peers” were and our assumption is that they are peers in science but not peers in ITC. In truth, being amongst the very few organizations friendly to the concepts of survival and transcommunication, we expected to have to publish the final report in the ATransC NewsJournal.

“The idea that you don’t show anybody, including your colleagues, results until they are peer-reviewed is something new in science. And it’s brought about because of media attention. I don’t think that’s good.”

Richard A. Miller in an interview by Michael D. Lemonick, Global warming “I stick to the Science,Scientific American, June 2011. (Available on docside.com.)

This is the first point we need to make. Peer review is not vetting. It is academics agreeing that the paper is academically sound, while vetting by subject matter specialists would have pointed out that many of the assumptions and procedures were inappropriate for the subject.

The basic scientific method is observation leading to hypothesis which predicts outcomes that can be tested to further refine the hypothesis. This is important and appropriate to the study of transcommunication. However, many of the tools of mainstream science are not appropriate for this study. Most glaring is the statistical discard of an example because it was understood to much more often than the others.

The listening panel and judging content procedures are essentially the same. As is clear in judging content, they are subjective considerations of objective phenomena. Being subjective,  it is necessary to constrain the results to plausible communication. This was done in judging content but counting one and two-syllable words as “Yes” for presence of words only serves to provide fodder for statistical analysis. In fact, the presence of EVP was noted, making the conclusion that EVP were not present unfounded. From the report:

Thus, consensus among participants during the listening panel did not rule out pareidolia (finding patterns in sound that are not there) as a possible explanation for the perceived presence of ITC in the Active Sessions.


Based on the distribution of these scores, it was concluded that responses perceived by the operator did not consistently contain information that logically matched the questions.

The researchers had been advised that previous attempts to use speech recognition have failed. Most EVP are formed in novel ways, which is especially true of EVPmaker. In fact, this is the common problem of frequency analysis of allophones and the speech recognition attempt made by the researchers. Both were interesting ideas which after seeing they did not detect phenomena known to be present, should have been abandoned. The report should have looked more like “We tried this but it did not work,” rather than, “We did this and it showed that phenomena were not present.”

Lessons Learned

Here is the research question used in the published report:

Can the presence of ITC be objectively detected in real-time ITC sessions recorded by an experienced EVPmaker operator in which the operator claims successful contact with an external entity has occurred?

There was a shift in emphasis from the original question (at beginning of this article) which highlights the breakdown in communication between Windbridge and ATransC. It is ATransC policy to promote open, candid collaboration and to make research results available to the average person. That was one of our requirements. ATransC is a nonprofit organization and funding this study had the potential of attracting more donations to enable future studies. Instead, the researchers refused to allow us to discuss the study until the final report was published—three years later.

The unavoidable conclusion is that research about techniques and human factors, such as protocols for listening panels, should be conducted by subject-matter specialists, and that work should be vetted by subject-matter specialists. Attempting to force-fit methodologies of mainstream science has not added to the understanding of these phenomena, except to show what does not work. There is a class division between academically trained but uninformed scientists and informed but generally poorly trained subject-matter specialist which impairs collaboration. This makes it necessary to conduct this work with the resources of the paranormal community.

Conclusion about EVPmaker

Despite the conclusions arrived at by Windbridge that EVP thought to be produced by EVPmaker are probably imaginary, there remain important examples of EVP from that technology which are very objective. So what is reasonable guidance for members? There can be no doubt; EVPmaker should not be recommended to people who are new to EVP. The difficult to follow output too easily leads people to find meaning where none was intended.

An example recorded in another study, “Her radio,” illustrates the complexities faced by researchers. Close examination of “radio” shows that it is actually a transform EVP—one formed by morphing noise to produce a clear expression. So in fact, that EVP is not a demonstration of EVPmaker’s capability. It could have been recorded with an ordinary audio recorder using background noise.

The ATransC recommendation will be that EVPmaker should be considered a specialty tool to be used by people already accustomed to recording EVP using a recorder with possible background noise (transform EVP). EVP from EVPmaker should be examined to determine whether or not it is actually transform EVP.

You can access the report on Windbridge’s website: windbridge.org/papers/BoccuzziBeischel2011JSE25ITC.pdf

Leave a Reply