August 18, 1991


by Tim Brooks, VP Research, USA Network

 [Note: When I wrote this in 1991 I was being polite by not giving the names of the shows used as examples; they are added here in brackets – TB, 2007]

Question: “If the networks do so much testing, why do so many programs fail?”


1. “Because producers don’t listen to the research.”

2. “Because programmers keep changing the time periods.”

3. “Because even good shows don’t stay on long enough for the audience to find them.”

4. “Actually, there aren’t as many flops as you think.”

Arguably, the correct answer is number four.  Think about it.  Television programming is one of the most intensively researched of all creative fields, and its “success rate”‑‑around 25% most seasons‑‑is far better than that of movies, books or records.  The difference is that its failures are far more visible.  You will probably never hear of the typical book or record that goes straight to the cut‑out bin, and many unsuccessful movies never even make it into general release.  But who could have missed the demise of “Chicken Soup”?

The purpose of the present article is to look at some of the elements of the arcane art of program testing.  Predicting the success of a creative work is one of the most difficult challenges research can face, yet properly used testing can enhance both the commercial viability and the artistic integrity of a program.  It’s all in how it is used.

Rule number one, perhaps surprisingly, is to look at the pieces not at the whole.  In a successful program many different elements work together seamlessly‑‑lead and supporting characters, story, setting, pace, production values and the always desirable “hook” or unique element that makes this show seem fresh and different.  A good program test will ask about all these things, and the analysis of the results will take them all into account.  Often a client wants a single number, thumbs up or down, the bottom line.  This may lead to undue emphasis on a single summary question such as “How likely would you be to watch this show?”, or “How would you rate this program‑‑excellent, good, fair or poor?”  A single question may not be a very good predictor of what will happen when the viewer actually has to make the viewing decision; too many other factors will come into play.  Probably the fairest answer to the first question is “it depends.”  The second question has little to do with actual viewing.

If the elements that make up a program (characters, story, “hook,” etc.) are strong, however, the program will probably do well no matter what the competitive situation.  This is especially true for continuing series.  In my experience, probably the most important predictor of the strength of a continuing series is the appeal of the characters.  For one‑time events such as movies and specials the story or hook takes on more importance (after all, you only need to get viewers curious enough to tune in once).  In either case, though, all elements are important.  If that understanding client still wants a bottom line, the researcher should try to place the program on a verbal scale (e.g., Strong‑Moderate‑Weak) or rank it among other shows that have been tested.  One widely used analysis system uses a composite numerical score based on scores for each of the key elements.  Thus strength in one area can to some degree make up for weakness in another.

Looking at the elements has another significant value: diagnostics.  Understanding how certain elements contribute to the appeal of the program and others do not may allow the producer to make changes which strengthen the program.  Sometimes this is simply a matter of removing minor irritants.  If it is too late to modify the program, or if the changes would undermine the creative intent, at least we know which elements to emphasize and which to avoid in promotion.

Related to the look‑at‑the‑elements approach is rule number two: ask LOTS of questions.  Adjectival scales, agreement or disagreement with statements about the program, forced choices and competitives (would you watch this program or a specific named alternative?) are all good approaches.  You’ll seldom have trouble keeping the respondent’s attention; people love to give their opinions about television shows.  With many different points of information about the program, the researcher has a much improved chance of discovering the dynamics of its appeal. The only potential drawback of this approach is that spending so much time on one property (perhaps ten minutes or more in a phone interview) makes it impractical to test a lot of programs at once.  But which would you rather have, good information on one or two programs or bad information on a dozen?

Aha, you say, all this is fine but you still haven’t told me how to tell a good score from a bad one.  No, and I won’t.  As in any type of research it is necessary to develop norms, and they will depend on the specific questions and rating scales used.  Of course you have to use the same key questions and scales in each test.  However, once you do it won’t take long to determine what levels indicate strength or weakness.

Rule number three is to test the property in a form as close to the final product as possible.  For a TV series, this means a full pilot if feasible.  Respondents react to what’s in front of them, nothing more and nothing less, and the closer that is to what they will actually be basing their viewing decision on, the more predictive the test will be.  Words are a poor substitute for pictures, but often we have neither the time nor the considerable amount of money required to expose respondents to a finished pilot.  A concept, a brief description of the program which is read or shown to respondents, is a widely used and acceptable substitute.  Don’t make it too brief, however.  The more you tell them about the property the more informed their reaction will be.  Typically I prefer to use two to four paragraphs to describe a series or movie.  Also, don’t make it “hyped” or overtly promotional; you want the respondent to react to the program, not your clever sales pitch.

Very brief descriptions, such as “log lines,” have little value in my experience, even for movies.  Also, perhaps surprisingly, demonstration tapes and other abbreviated versions of pilots have not proven very effective thus far.  They’re a little like promotional spots; respondents know they’re not the real thing.  High‑tech techniques such as having groups of viewers push buttons (or turn dials) to express their feelings during an actual viewing are fun, and can shed some light on strong and weak moments during the show.  They are best used with follow‑up questions about why the respondent felt that way at that time.  They are not a substitute for a full‑fledged pilot or concept test, however.  The respondent needs to think about and verbalize his or her reasons for liking or not liking a program if we are to understand what is driving the show’s appeal, or lack of it.  The only kind of button the typical viewer is likely to push back home is the ever‑present remote control, or “zapper,” consigning our best programming efforts to oblivion with no explanation at all.

Presenting the results of a program test to the client, especially to someone involved in the creation of the property, can be the biggest challenge of all.  How do you tell a mother her beloved child is, well, er, um, (clear throat), how do I say this‑‑ugly?  Most shows fail, but every producer knows that his is a winner.  Show him a focus group and he’ll find something to prove that, even if respondents were throwing the cole slaw at the screen (“did you see how involved they were!”)

The first thing to remind the client, and yourself, is that program testing is just one input in the creative process.  No one should ever program “by the numbers,” but a smart producer works from information, not ignorance, about audience reactions.  If the test results suggest a modification that fits into‑‑or even enhances‑‑the creator’s vision, and satisfies the viewer too, all the better.  As an NBC marketing executive once put it, “testing is like turning on the lights in a room.  It doesn’t tell me what to do next, but it shows me where the furniture is.”

If at all possible, test results should not be used as a report card or a weapon for use by warring factions in the creative process.  In fact, clients should be urged to disseminate the results as narrowly as possible, only to those creative people who can directly act on them and/or to the executives who may have to decide whether the project proceeds.  All parties should be thoroughly briefed on the limitations of the testing, and the fact that their wise judgment should of course be the last word!  Egos being what they are, some producers feel that all testing is worthless compared to their brilliant insights; the lights are always on brightly in their room, and they just know what will work (when it doesn’t, of course that is somebody else’s fault).  Stories are legion in the business about programs that flopped in testing and went on to become big hits, and vice versa.  Most of these stories are either substantially distorted or are downright false.  For example there is the very successful game show producer [Merv Griffin] who delights in telling interviewers that his most famous series, one of the biggest hits of the 1980s [Wheel of Fortune], tested poorly in pilot.  What he doesn’t tell them, or perhaps has forgotten, is that originally two game shows were in fact tested.  The one with the now‑famous name did indeed test poorly, but it never aired in that form.  The show that went on the air under that name was actually an amalgam of the strongest elements of both shows.

The producer of one of the biggest hit situation comedies of the mid‑1980s [Family Ties] has told reporters that early in the show’s history he observed how the studio audience was reacting favorably to one of the supporting characters [Michael J. Fox].  He began to emphasize that character in future episodes, whereupon the young actor exploded into a major star.  He doesn’t always mention that program tests from early in the show’s run pointed out a similar reaction among test audiences.  Perhaps the producer was doing a little reading as well as backstage observing?

One of the most innovative police shows of the 1980s [Hill Street Blues], which was both critically acclaimed and extremely influential in its day, is said to have tested poorly in pilot.  That story is also only half true.  The test audience did react negatively on the bottom line question “would you watch?”, as they often do to programs that are radically different from what they’re used to.  (And indeed this program took time to find an audience once it went on the air.)  But they also liked some of the characters very much, and significant changes were made in the show before it premiered to address both the character and story concerns of the test audience.  Most observers today would say that those changes helped make the show a hit.  Did that “poor” test really not make a valuable contribution to the success of the series?

In a sense, program testing works against itself in terms of documenting its validity.  If a test is weak but identifies elements that could be strengthened, they are and the show becomes a hit, was the test a failure?  If a strong test contains results that suggest that the audience is reacting to superficial elements, or socially loaded ones, and the show flops, was that test a failure or was it a warning signal?  A good analyst is always on the lookout for such clues.  Lore in the business is that pilots with cute dogs and kids always do well, but we know that and we look for whether that alone has driven the bottom‑line score.  “Premise pilots,” those that set up the premise for a series with characters and dramatic situations that won’t recur in regular episodes, are another problem.

Stripping away the lore and anecdotes, does properly done program testing actually predict success?  Not in every case, but most of the time, yes.  A study conducted at one network indicated that strong testing series pilots were more than twice as likely to succeed as those in the moderate range, which in turn were more often successful than those with weak tests.  Likewise, an analysis of movie concept test results indicated a positive correlation between those tests and on‑air performance.  (This of course reflects that network’s testing procedures, not all program testing.)

Program testing certainly has its detractors in the creative community, but the problem, I believe, stems more from occasional misuse of the data than from the data itself.  A good deal of experience is needed to test pilots and concepts properly.  Executed and used correctly, a good program test is the producer/programmer’s ally, not his enemy.


This page was last modified on November 4th, 2011.
© 2011 Tim Brooks All rights reserved HomeTV HistoryRecord Industry HistoryCopyright IssuesConsulting ServicesBook and CD ReviewsAbout My BooksGeorge W. Johnson, the First Black Recording StarLinks & ResourcesDartmouth CollegePress RoomFAQ