JSLHR, Volume 40, 867–876, August 1997
Children Recovered From
Stuttering Without Formal
Treatment: Perceptual Assessment
of Speech Normalcy
Patrick Finn
University of New Mexico
Albuquerque
Roger J. Ingham
University of California, Santa
Barbara
Nicoline Ambrose
Ehud Yairi
University of Illinois, ChampaignUrbana
Current evidence suggests that young children who recover from stuttering are
essentially stutter-free. However, there is no evidence to indicate if their speech is
perceptually indistinguishable from normally fluent peers or whether they retain
perceptually unusual speech. One important example of recovery from stuttering
is children who have recovered without receiving formal treatment. An investigation was conducted to determine if the speech of these children is perceptually
different from the speech of children who have never stuttered. Speakers consisted
of 10 preschool and early school-age children documented as recovered from
stuttering without benefit of formal treatment. In a series of studies they were
compared with 10 children who had never stuttered. Three groups of judges—
sophisticated, unsophisticated, and experienced—were separately asked, using
videotaped speech samples of the children, to decide which samples were from
children who used to stutter. Results revealed that the children who recovered from
stuttering were perceptually indistinguishable from the normal controls. The same
result was obtained regardless of whether the samples were presented in pairedstimulus or single-stimulus mode. Two of the groups of judges were also instructed
to rate the speech naturalness of the speech samples. The speakers were not
distinguished on this measure either. Methodological issues and the implications
of the findings are discussed.
KEY WORDS: spontaneous recovery, speech naturalness, speech fluency
I
t is generally believed that the nature of developmental stuttering is
different for children who stutter relative to adults who stutter
(Bloodstein, 1995; Van Riper, 1982). Children who stutter are usually
described as having an impairment that is more amenable to change
than adults who stutter. The suspected reason for this difference between the two age groups is the chronicity of their impairment. Relative
to onset, children who stutter have experienced the impairment for a
shorter time than adults who stutter. An important result of this difference is that children are believed more likely to attain a complete recovery from the disorder (Bloodstein, 1995). Moreover, this recovery may
occur in some cases without the assistance of formal treatment. In contrast, adults are less likely to attain the same degree of recovery and
they are more likely to retain residual characteristics of the disorder,
even if they have received formal treatment (Wingate, 1976). This means
that the recovered speech of children is likely to be more comparable to
the age-appropriate, normally fluent speech of their peers than is the
©1997, American Speech-Language-Hearing Association
1092-4388/97/4004-0867
Journal of Speech, Language, and Hearing Research
Downloaded From: http://jslhr.pubs.asha.org/ by University of California, Santa Barbara, Roger Ingham on 04/16/2014
867
868
recovered speech of adults when compared to their normal peers.
Various clinical and anecdotal accounts of unassisted
and assisted recovery from early childhood stuttering
have suggested that young children can attain essentially stutter-free speech (e.g., Bloodstein, 1995; Onslow,
Andrews, & Lincoln, 1994; Yairi & Ambrose, 1992). Notwithstanding these positive accounts, to the best of our
knowledge there is no laboratory evidence that young
children who recover from stuttering would be judged
as normally fluent. The fact that they may be stutterfree does not mean they are necessarily normally fluent
and natural sounding (Finn & Ingham, 1989). Specifically, there are no perceptual data which show that children who recover from stuttering are indistinguishable
from normal speakers. This is a concern because there
is considerable perceptual and anecdotal evidence that
adults who recover from stuttering often have presented
with speech that was distinguishable from that of normal speakers (e.g., Ingham & Packman, 1978; Runyan
& Adams, 1978, 1979; Wingate, 1976). Even in cases
where the speech of successfully treated adults was not
perceptually different from normal speakers, their
speech naturalness was still judged as sounding significantly more unnatural than that of normal speakers
(Ingham, Gow, & Costello, 1985).
The factors responsible for recovery from stuttering in children are still not understood. There have been
reports suggesting that children who stutter are highly
responsive to a wide variety of ameliorative stimuli ranging from shadowed and rhythmic speech to responsecontingent stimulation (Ingham, 1984). An important,
and possibly different, example is the phenomenon of
unassisted recovery. Several studies have documented
a substantial number of children who recovered from
stuttering without receiving formal treatment (Andrews
& Harris, 1964; Glasner & Rosenthal, 1957; Johnson &
Associates, 1959; Yairi & Ambrose, 1992; Yairi, Ambrose,
& Niermann, 1993). Systematic investigation of these
children could be useful for establishing whether their
recovery has in fact resulted in speech that is indistinguishable from normally fluent speakers.
One commonly used method for determining differences between children or adults who stutter and their
matched normal peers is a perceptual judgment task.
Typically, such a judgment task will require judges to
distinguish perceptually between speech samples obtained from two types of speakers. Two factors must be
considered when constructing this task. First, the judges’
level of sophistication and experience is important because this could affect their ability to perceptually distinguish between types of speakers. In turn, this could
affect the meaningfulness of the findings. For example,
perceived differences between speakers might be viewed
JSLHR, Volume 40, 867–876, August 1997
as less consequential if the differences were so small
and subtle that they could be detected only by highly
sophisticated judges (Runyan & Adams, 1979). Second,
the type of perceptual task—discrimination or identification—must be considered. The discrimination task
(paired-stimulus paradigm) requires observers to differentiate speakers when presented with pairs of
samples, one from each type of speaker. The identification task (single-stimulus paradigm) requires judges
to identify the type of speaker when presented with
individual samples from both types of speakers. Past
research contrasting these two tasks with samples from
children or adults who stutter has had mixed results:
Colcord and Gregory (1987) and Runyan, Hames, and
Prosek (1982) reported no differences, whereas
Wendahl and Cole (1961) and Young (1964) obtained
significant differences for task effect. So far, these two
tasks have not been examined with children believed
to have recovered from stuttering.
Therefore, the purpose of this investigation was fourfold: First, to determine if the speech of children who
have recovered from stuttering without formal treatment
was perceptually different from the speech of children
who had never stuttered. Second, to determine if the
speech of these children was judged perceptually different depending on the sophistication and experience of
the judges. Third, to determine if the speech of these
children was judged perceptually different depending
on the type of perceptual task—discrimination or identification. Finally, to determine if their speech naturalness was judged to be different.
Method
Speakers
Two groups of preschool through early school-age
children provided the speech samples for all studies.
They were originally participants in a longitudinal study
investigating the speech characteristics of early childhood stuttering at the University of Illinois. For that
study, they were videotaped speaking with an adult,
usually a parent, while sitting at a table playing with a
standard set of toys (see Yairi & Ambrose, 1992). These
samples were obtained on repeated occasions over several years.
Children Recovered From Stuttering (RS)
To participate in the present study, speakers met
the following selection criteria described by Yairi,
Ambrose, Paden, and Throneburg (1996). First, they
were initially judged as children with developmental
stuttering. Stuttering criteria included (a) the parent(s)
judged the child as stuttering, (b) two speech-language
Journal of Speech, Language, and Hearing Research
Downloaded From: http://jslhr.pubs.asha.org/ by University of California, Santa Barbara, Roger Ingham on 04/16/2014
869
Finn et al.: Assessment of Children Recovered From Stuttering
pathologists independently judged the child as stuttering, (c) the child’s stuttering was rated greater than mild
in severity, and (d) the child’s speech contained a minimum of three stuttering-like disfluencies (e.g., part-word
and single-syllable word repetitions, sound prolongations, and silent blocks) per 100 syllables. Second, they
recovered from stuttering without exposure to formal
speech treatment. At the conclusion of the initial evaluation, parents were given basic information about stuttering, told that their child might or might not spontaneously recover, and advised about various beliefs
regarding early stuttering—including the view that talking slower to the child might be beneficial in promoting
fluency. The option of formal treatment was offered, but
for various reasons parents chose not to seek treatment.
Nonetheless, all parents continued to participate in the
longitudinal study, which required them to have their
child videotaped at least every 6 months. Third, they
were later judged as recovered from stuttering. Recovery criteria (Yairi et al., 1996) included (a) the parent(s)
judged the child as no longer stuttering and rated the
child’s speech as normally disfluent, (b) a speech-language pathologist also judged the child as no longer stuttering and rated the child’s speech as normally disfluent,
and (c) stuttering-like disfluencies were 2.99 or fewer
per 100 syllables. Using these three sets of criteria,
speakers were classified as children who recovered from
stuttering (RS) without formal treatment.
As can be seen in Table 1, RS speakers consisted of
7 males and 3 females. At the initial evaluation, two
speech-language pathologists rated each speaker’s stuttering severity on an 8-point scale (where 0 = normal
disfluency, 1 = very mild stuttering, and 7 = very severe
stuttering). The severity ratings ranged from 2.26 to
5.87, with a mean of 3.63. Average age at onset of stuttering was 30 months, average age at recovery was 53
months, average duration between onset and recovery
was 22.6 months, and average age at time of videotaped
speech sample was 59 months (range = 39 to 80 months).
Children Who Never Stuttered (NS)
Ten children who did not have a history of stuttering and were judged by their parents and a speech-language pathologist as having age-appropriate speech and
language skills were matched with the RS for sex and
age within 2 months. At the time of the videotaping, NS
average age was 59 months (range = 39 to 79 months).
Perceptual Discrimination Task (Paired
Stimulus)
Using a discrimination (paired-stimulus) task, sophisticated judges were asked to determine if the speech of
children who recovered from stuttering without formal
treatment was distinguishable from the speech of normally fluent children.
Sophisticated Judges
The sophisticated judges consisted of 12 graduate
students in speech-language pathology (11 females, 1
male; mean age = 34.1 years; range = 24–51 years). The
judges were classified as sophisticated because they had
successfully completed a graduate course on stuttering.
Speaker Stimulus Videotape
For the stimulus videotape, full-face speech samples
of the child were obtained from the child-adult videotaped dialogues. RS speakers were paired with their
respective NS speakers. Speech samples were selected
from the videotaped dyads that involved the longest continuous segments during which the child was speaking
and the adult listener was offering the fewest responses.
Each sample pair was matched for number of syllables
spoken (mean = 66.2 syllables; range across sample pairs
= 38–85 syllables). Average sample duration was 1 min.
Sample order within and across matched pairs was randomized. A second stimulus tape with randomized
sample order was prepared for reliability purposes.
Procedure
The sophisticated judges performed two experimental tasks: a discrimination task and a speech descriptor
task. Both tasks were performed after observing each
pair of samples of an RS and NS speaker. For reliability
Table 1. RS speakers: age at onset and recovery; duration between
onset and recovery; and age at which videotaped speech sample
was obtained (all ages are in months).
Speaker Sex
1
2
3
4
5
6
7
8
9
10
Mean
m
m
m
m
f
m
m
f
f
m
Average
stuttering
severitya
Age at
onset
Age at
recovery
Months
between
onset
and
recovery
4.87
2.55
2.26
2.35
3.19
4.86
2.53
3.77
5.87
4.05
3.63
26
42
32
45
26
24
26
28
30
26
30
63
68
59
58
49
70
34
48
43
39
53
37
26
27
13
23
46
8
20
13
13
22.6
Age at
speech
sample
63
80
59
69
55
70
52
61
43
39
59
Speakers were rated on an 8-point stuttering severity scale where 0 =
normal disfluency, 1 = very mild stuttering, and 7 = very severe
stuttering.
a
Journal of Speech, Language, and Hearing Research
Downloaded From: http://jslhr.pubs.asha.org/ by University of California, Santa Barbara, Roger Ingham on 04/16/2014
870
purposes, the judges repeated the tasks on the same
samples 8 weeks later.
Perceptual discrimination task. For the discrimination task, the judges were told that one child from each
sample pair used to have a stuttering problem and that
the other child never had a stuttering problem. After
observing each pair of speakers, judges were instructed
to decide which child used to have a stuttering problem.
They were not told that the speakers were believed to
have recovered from stuttering without the assistance
of treatment.
Speech descriptor task. After deciding which one of
the pair of speakers used to stutter, the judges were instructed to write down a brief description of the speech
characteristics or communicative behaviors of that child
which helped them make their choice. Judges were given
as much time as necessary to write their statements.
For both tasks, judges were instructed to make their
decisions independently of the other judges.
Perceptual Identification Task (Single
Stimulus)
Using an identification task, two groups of independent judges—unsophisticated and experienced—were
asked to determine if the speech samples were obtained
from children who recovered from stuttering or were
from normal speakers. They were also asked to rate the
speech naturalness of the speech samples.
Unsophisticated Judges
The unsophisticated judges consisted of 26 graduate and undergraduate students (21 females, 5 males;
mean age = 33.0 years; range = 22–46 years) majoring
in speech-language pathology. These judges were classified as unsophisticated because they had not completed a graduate course on stuttering or observed a
client who stuttered.
Experienced Judges
The experienced judges consisted of 14 speech-language pathologists (12 females, 2 males; mean age =
42.1 years; range = 34–57 years). The main criterion for
participation was that the judges during the last 5 years
of clinical experience had worked primarily with preschool or early school-age children, but not necessarily
with children who stuttered. The average years of experience was 11.1 (range = 5.5–17 years). Therefore, these
judges were classified as experienced.
Speaker Stimulus Videotape
For the stimulus videotape, speech samples were
obtained from the speakers who participated in the
JSLHR, Volume 40, 867–876, August 1997
discrimination task. Again, the speech samples were
those with the longest continuous segments of child
speech with the fewest responses from the adult listener
in the video dyad. However, all samples were matched
for number of syllables spoken (63 ± 2 syllables). Average sample duration was 1 min. Samples were randomly
ordered and separated by a 5-s pause. Two stimulus
tapes that included the same samples, but arranged in
different random orders, were prepared for separate experimental tasks to be described below. Two additional
stimulus tapes, also with samples arranged in random
order, were prepared for reliability purposes.
Procedure
The unsophisticated and experienced judges independently performed the same two experimental tasks:
an identification task and a speech-naturalness rating
task. These tasks were performed separately with a short
rest period between them. Identical speech samples were
presented for each task except the sample order was
randomized. Judges were not informed that identical
samples would be observed in both tasks. The order of
the tasks was counterbalanced across judges. For reliability purposes, the judges repeated the tasks on the
same samples, one week later.
Perceptual identification task. Judges were told they
would view speech samples of children who used to stutter and children who had never stuttered. For each
speech sample, they were instructed to independently
decide if the speech sample was obtained from a child
who used to stutter or a child who had never stuttered.
They were not told that the speakers were believed to
have recovered from stuttering without the assistance
of treatment.
Speech naturalness rating task. Judges were told
they would view speech samples of children who used to
stutter and children who had never stuttered. They were
instructed to rate the speech naturalness of each child’s
speech using a 9-point scale where 1 represented highly
natural sounding and 9 represented highly unnatural
sounding. Rating instructions were identical to those
described by Martin, Haroldson, and Triden (1984).
Results
Perceptual Discrimination Task:
Sophisticated Judges
The frequency of the sophisticated judges’ correct
and incorrect responses by speaker type were tallied.
The percentage of total correct responses for RS speakers was 45.0%. A chi-square analysis revealed that this
was not significantly different (χ2 = 2.40; df = 1; p = .12)
from responses expected to occur by chance (50%).
Journal of Speech, Language, and Hearing Research
Downloaded From: http://jslhr.pubs.asha.org/ by University of California, Santa Barbara, Roger Ingham on 04/16/2014
871
Finn et al.: Assessment of Children Recovered From Stuttering
Reliability of the correct responses was determined by
comparing each judge’s first and second ratings for the
same speaker across the two rating occasions. Results
revealed a mean intrarater agreement level of 60.0%.
The percent of correct responses had decreased from
45.0% to 38.3% across the two occasions.
Speech Descriptor Task: Sophisticated
Judges
Speech characteristics described by sophisticated
judges as the basis for their selection of a child as used
to stutter were examined for consistency with stuttering. The descriptors across both occasions were classified on the basis of four categories: (a) characteristic of
stuttering (e.g., behaviors that typify the problem of
stuttering), (b) characteristic of stuttering treatment
outcome (e.g., behaviors that might typify residuals
from having received treatment for stuttering), (c) characteristic of communicative disorders other than stuttering, and (d) other (e.g., not consistent with any categories).
A total of 265 statements were categorized by the
first author. Results showed that (a) 59.7% of the judges’
statements were consistent with stuttering (e.g., “repetition of /p/ phoneme on two different words”), (b) 24.5%
were consistent with treatment outcome (e.g., “very difficult to tell—perhaps the revisions on ‘p’ words were
leftovers from treatment”), (c) 7.5% were consistent with
other communicative disorders (e.g., “short MLU, didn’t
seem to want to speak as much, a lot of single word
answers”), and (d) 8.3% were unclassifiable (e.g., “hard
to pick one”). A graduate student who had completed a
stuttering course but had not participated in any other
part of this study independently categorized the judges’
statements. Comparison with the first authors’ judgments revealed 78.7% interrater agreement.
Perceptual Identification Task:
Unsophisticated Judges
The frequency of never stuttered judgments by the
unsophisticated judges was determined for each speaker
(see Appendix for individual data). Mean percent of
never stuttered judgments for RS speakers was 72.3%
(SD = 17.1, range = 42.3–88.5%) and for NS was 72.7%
(SD = 15.2, range = 46.2–96.2%). A t test revealed that
the difference between means was nonsignificant, t(18)
= –.05, p = .96. Mean percent of used to stutter judgments was 27.7% for RS speakers and 27.3% for NS
speakers.
Reliability was determined by comparing each
judge’s first and second ratings across occasions for the
same speaker. Mean intrarater agreement was 76.2%
(range = 60–95%).
Perceptual Identification Task:
Experienced Judges
The frequency of never stuttered judgments by the
experienced judges was determined for each speaker (see
Appendix for individual data). Mean percent of never
stuttered judgments was 68.6% (SD = 25.0, range = 35.7–
100%) for RS speakers and 72.9% for NS speakers (SD
= 15.7, range = 50.0–100%). A t test revealed the difference between means was nonsignificant, t(18) = –.46, p
= .65. The mean percent of used to stutter judgments
was 31.4% for RS speakers and 27.1% for NS speakers.
Intrarater agreement was determined by comparing each experienced judge’s first and second ratings
across occasions for the same speaker. Mean intrarater
agreement was 79.6% (range = 65–95%)
Speech Naturalness Rating Task:
Unsophisticated Judges
Average speech naturalness ratings by the unsophisticated judges were estimated for each speaker (see Appendix for individual data). The average speech naturalness rating for RS speakers was 4.24 (SD = 1.22, range
= 2.65–6.88) and for NS was 3.82 (SD = 1.05, range =
2.08–5.19). The difference between means was nonsignificant, t(18) = .83, p = .42.
To determine intrarater agreement, each judge’s
first and second ratings for the same speaker across
occasions were compared. An acceptable level of agreement was defined as ratings that were identical or differed by no more than ±1 rating score (Martin et al.,
1984). Using this criterion, mean intrarater agreement
was 64.0% (see Table 2). For interrater agreement, each
judge’s rating of a speaker was compared with the ratings of the same speaker by the other judges. An acceptable level of agreement was defined as ratings that
were identical or differed by no more than ±1 rating
score (Martin et al., 1984). Using this criterion, mean
interrater agreement was only 40.6% (see Table 2). To
determine if unreliable judges were influencing this
outcome, a reanalysis of interrater agreement was performed on the basis of judges who demonstrated at least
80% intrarater agreement (n = 7). The findings indicated that interrater agreement at the ±1 level actually decreased to 38.3%.
Speech Naturalness Rating Task:
Experienced Judges
Average speech naturalness ratings by the experienced judges were estimated for each speaker (see Appendix for individual data). The average speech naturalness rating for RS speakers was 3.71 (SD = 1.29, range
= 2.57–6.93) and for NS was 3.24 (SD = 0.97, range =
Journal of Speech, Language, and Hearing Research
Downloaded From: http://jslhr.pubs.asha.org/ by University of California, Santa Barbara, Roger Ingham on 04/16/2014
872
JSLHR, Volume 40, 867–876, August 1997
Table 2. Unsophisticated judges: cumulative number and percentage (in parentheses) of intrarater and
interrater agreements for speech-naturalness rating scores.
Speaker type
±0
±1.0
±2.0
±3.0
±4.0
±5.0
±6.0
±7.0
±8.0
252
(96.9)
256
(98.5)
257
(98.8)
259
(99.6)
259
(99.6)
260
(100)
260
(100)
3030
(93.2)
3056
(94.0)
3172
(97.6)
3175
(97.7)
3241
(99.7)
3229
(99.4)
3250
(100)
3250
(100)
Intrarater agreement
RS
NS
80
(30.8)
86
(33.1)
161
(61.9)
172
(66.2)
198
(76.2)
210
(80.8)
226
(86.9)
230
(88.5)
243
(93.5)
245
(94.2)
Interrater agreement
RS
NS
446
(13.7)
543
(16.7)
1254
(38.6)
1388
(42.7)
1929
(59.4)
2021
(62.2)
2411
(74.2)
2448
(75.3)
2764
(85.0)
2803
(86.2)
Table 3. Experienced judges: cumulative number and percentage (in parentheses) of intrarater and
interrater agreements for speech-naturalness rating scores.
Speaker type
±0
±1.0
±2.0
±3.0
±4.0
±5.0
±6.0
140
(100)
139
(99.3)
140
(100)
868
(95.4)
855
(93.9)
892
(98.0)
881
(96.8)
±7.0
±8.0
910
(100)
904
(99.3)
910
(100)
Intrarater agreement
RS
NS
46
(32.9)
49
(35.0)
92
(65.7)
90
(64.3)
112
(80.0)
116
(82.9)
129
(92.1)
131
(93.6)
136
(97.1)
135
(96.4)
Interrater agreement
RS
NS
143
(15.7)
180
(19.8)
406
(44.6)
414
(45.4)
576
(63.3)
578
(63.5)
709
(77.9)
670
(73.6)
803
(88.2)
784
(86.2)
1.64–4.71). The difference between means was nonsignificant, t(18) = .91, p = .37.
Discussion
Intrarater agreement, based on ratings that were
identical or differed by no more than ±1 rating score,
was 65.0% (see Table 3). Interrater agreement, based
on ratings that were identical or differed by no more
than ±1 rating score, was only 45.1% (see Table 3). It
was not possible to examine the influence of unreliable
judges because an insufficient number of judges (n = 3)
demonstrated 80% intrarater agreement.
The main purpose of this investigation was to determine if the speech of children who had recovered from
stuttering without formal treatment was perceptually
different from the speech of children who had never stuttered. Results showed that the children who recovered
from stuttering were not distinguished from their
nonstuttering peers. The same result was obtained regardless of the type of perceptual task (discrimination
or identification) or the judges’ level of sophistication
and experience.
Correlations Between Experienced and
Unsophisticated Judges
A correlational analysis between the perceptual
judgments of never stuttered from experienced and unsophisticated judges revealed a significant positive correlation, r = .76, p < .001. There was also a significant
positive correlation between the two groups of judges’
speech naturalness ratings, r = .79, p < .001.
The results of the discrimination task revealed that
sophisticated judges were unable to discriminate between paired speech samples from children who used to
stutter and children who had never stuttered. Two factors need to be considered when interpreting this finding. First, the reliability of this finding is problematic
because the judges demonstrated unsatisfactory levels
Journal of Speech, Language, and Hearing Research
Downloaded From: http://jslhr.pubs.asha.org/ by University of California, Santa Barbara, Roger Ingham on 04/16/2014
873
Finn et al.: Assessment of Children Recovered From Stuttering
of intrarater agreement in making their judgments. At
the same time, this concern may not be critical because
the judges actually made fewer correct responses (e.g.,
correctly identifying a speaker who used to stutter)
across the two judgment occasions. This may mean that
with repeated observations of the same samples judges
became less convinced that they were samples of speakers
who used to stutter and were more likely speakers who
had never stuttered. Second, although the judges were
unable to discriminate between the two types of speakers,
there is evidence that their judgments were at least guided
by task-appropriate criteria. When asked to describe the
basis for selecting a speaker as used to stutter,1 judges typically described speech characteristics that were consistent
with stuttered speech. Judges also described selecting some
speakers on the basis of speech behaviors considered characteristic of treated recovered speech. Though they were
not told that speakers had recovered without formal treatment, some judges apparently inferred that if speakers
used to stutter then treatment was the responsible agent.
Therefore, the judges’ inability to discriminate between
speaker types did not appear to be the result of employing
invalid judgment criteria.
A potential drawback of the discrimination task is
that it might have imposed an artificial comparison between speakers that was not valid for either that speaker
or for a particular judge. Some idiosyncratic feature of
the NS speaker, for example, might have distracted
judges from relevant speech features of the RS speaker.
In contrast, an identification task in which each speaker
is presented individually would allow judges to evaluate each speaker on his or her own terms.
The findings of the identification task systematically
replicated the findings from the discrimination task. Both
unsophisticated and experienced judges were unable to
distinguish between the speech of children who had recovered and who had never stuttered. Furthermore, the
trustworthiness of these findings was bolstered by the
relatively acceptable levels of intrarater agreement for
this task in comparison with the discrimination task.
The speech naturalness ratings provided additional
evidence that there was no perceptual difference between the two groups of speakers. Comparison between
the speech naturalness of the two groups of children
revealed nonsignificant differences. This finding was
the same regardless of whether the judges were unsophisticated or experienced. Both groups of judges also
rated each speaker with comparable levels of naturalness. However, these promising findings must be interpreted cautiously because the reliability of these
ratings was unsatisfactory,2 regardless of the judges’
background. The reasons for this low agreement warrant
some discussion.
This is the first study that has attempted to use this
rating scale with speech samples from young children with
a relatively large number of samples and judges. The only
other study to employ this scale with children who stutter was also unable to demonstrate high agreement between two clinical judges (see Onslow, Costa, & Rue, 1990).
However, this unsatisfactory agreement may not be a specific limitation of the speech naturalness scale. Rafaat,
Rvachew, and Russell (1995) reported equally unsatisfactory interjudge agreement when experienced clinicians
were rating the severity of phonological impairment in
young children. They suggested that the greater range of
“normal” that exists for young children combined with
variability in clinicians’ knowledge of young children’s
speech may be the main factors contributing to low rater
agreement. Future research is necessary to determine if
these factors also affect the speech naturalness scale, especially if it is going to be used to assess the speech of
young children who stutter. At minimum, future researchers using the speech naturalness scale may have to instruct judges to make their ratings relative to an appropriate model of children’s speech.
Findings from the present study also add to an
emerging view concerning strategies for evaluating
whether persons who stutter achieve normally fluent,
natural-sounding speech as a result of treatment or after recovering from stuttering. A perceptual discrimination task similar to the one used in this study was
introduced some years ago by Ingham and Packman
(1978) and Runyan and Adams (1978, 1979) in order to
determine whether adults who stutter had achieved perceptually normal speech. Subsequently, several studies
suggested that Martin et al.’s (1984) 9-point speech naturalness rating scale might be a more practical and sensitive method for this purpose (Ingham et al., 1985;
Ingham & Onslow, 1985; Runyan, Bell, & Prosek, 1990).
That recommendation seemed justified because it was
employed with satisfactory levels of rater agreement
(e.g., Martin et al., 1984). More recent studies have
shown that those agreement levels are often unpredictable. Different studies have found that not all judges
achieve the high levels of rater agreement reported in
earlier studies, especially when individual rather than
group judgments are required (Finn & Ingham, 1994;
Martin & Haroldson, 1992; Metz, Schiavetti, & Sacco,
1990; Onslow, Adams, & Ingham, 1992). The present
2
1
Note that judges were instructed to select the speaker who used to
stutter. It may be worth considering whether the results would have been
different had judges been instructed to select the speaker who had never
stuttered, instead.
For rating the speech naturalness of audiovisual speech samples of
adults who stutter and do not stutter, Martin and Haroldson (1992)
reported an average level of 84% for intrarater agreement (combined for
stutterers and nonstutterers at ±1.0) and 80% for interrater agreement
(combined for stutterers and nonstutterers at ±1.0).
Journal of Speech, Language, and Hearing Research
Downloaded From: http://jslhr.pubs.asha.org/ by University of California, Santa Barbara, Roger Ingham on 04/16/2014
874
study confirms this trend and shows that it also occurs
when ratings are made of children’s speech (also see
Onslow et al., 1990). These findings add to the argument
that perhaps further research on this scale should focus
on the development of standards for rating levels of naturalness, in much the same way as rating models have been
developed for voice (Gerratt, Kreiman, AntonanzasBarroso, & Berke, 1993). In the meantime, the paired
stimulus paradigm described in this study and recommended by others (e.g., Adams, 1984) may offer clinicians
and researchers a less problematic method of deducing
whether children have achieved normal sounding speech.
Two factors might account for the apparently normal-sounding speech of the children recovered from stuttering. First, these children’s recoveries occurred without exposure to formal treatment. Therefore, their
recovery did not necessarily involve changes in their
speech behavior. In comparison, treated adults’ non-normal-sounding, fluent speech is usually attributed to the
effects of changes in their speech behavior that are due
to treatment procedures (e.g., prolonged speech). Second,
perceptual studies have found that the fluent segments
of speech from children who still stutter is undifferentiated from normally fluent peers (Colcord & Gregory, 1987;
Krikorian & Runyan, 1983). This suggests that the child’s
recovered speech pattern retains the normally fluent dimensions that were already present.
The present findings provide the first objective evidence that children who recover from stuttering without
exposure to formal treatment are likely to present with
normal sounding speech. It is unknown if this finding
would also extend to children who recover from stuttering because of formal treatment. Nonetheless, the outcome of this study suggests that normal fluency would be
a reasonable treatment goal. Furthermore, the mechanism responsible for recovery in this study is unknown.
One possible factor is that the brief parent counseling that
occurred during the initial assessment contributed to their
recovery. However, there is no credible evidence to support the view that such a limited informative session would
result in an ameliorative effect. There is also no way of
verifying that the parents actually followed any clinical
suggestions offered during their counseling session. Future research should examine the kinds of parent behaviors that are beneficial to the child who stutters.
In summary, a series of systematic replications
have demonstrated that children who recover from
stuttering are perceptually indistinguishable from
children who have never stuttered. Obviously, these
results should be considered preliminary.3 Because of
3
The small sample size in this study reduced the power of the statistical
test to find differences where differences may in fact exist (Young, 1994).
Therefore, it is possible that the nonsignificant statistical differences are
Type II errors.
JSLHR, Volume 40, 867–876, August 1997
the controversy surrounding spontaneous recovery in
early childhood and the important theoretical and
clinical implications, it should be carefully assessed
and analyzed. Future research with a larger number
of speakers and speech samples taken at several developmental stages is warranted.
Acknowledgments
Portions of this paper were presented by the first author
at the Annual Convention of the American Speech-Language-Hearing Association, New Orleans, LA, 1994. Preparation of this manuscript was supported in part by Grant
#RO1 DC-00060 awarded to R. J. Ingham by the National
Institutes of Health. This research was also supported in
part by Grant #RO1 DC-00459 from the National Institutes
of Health, National Institute on Deafness and Other
Communication Disorders (PI: E. Yairi).
References
Adams, M. R. (1984). The young stutterer: Diagnosis,
treatment, and assessment of progress. In W. H. Perkins
(Ed.), Stuttering disorders (pp. 41–55). New York: ThiemeStratton.
Andrews, G., & Harris, M. (1964). The syndrome of
stuttering. London: Heinemann.
Bloodstein, O. (1995). A handbook on stuttering (5th ed.).
San Diego, CA: Singular Publishing.
Colcord, R., & Gregory, H. (1987). Perceptual analyses of
stuttering and nonstuttering children’s fluent speech
production. Journal of Fluency Disorders, 12, 185–196.
Cordes, A. K., Ingham, R. J., Frank, P., & Ingham, J. C.
(1992). Time interval analysis of interjudge and intrajudge
agreement for stuttering event judgments. Journal of
Speech and Hearing Research, 35, 483–494.
Finn, P., & Ingham, R. J. (1989). The selection of “fluent”
samples in research on stuttering: Conceptual and
methodological considerations. Journal of Speech and
Hearing Research, 32, 401–418.
Finn, P., & Ingham, R. J. (1994). Stutterers’ self-ratings of
how natural speech sounds and feels. Journal of Speech
and Hearing Research, 37, 326–340.
Gerratt, B. R., Kreiman, J., Antonanzas-Barroso, N., &
Berke, G. S. (1993). Comparing internal and external
standards in voice quality judgments. Journal of Speech
and Hearing Research, 36, 14–20.
Glasner, P., & Rosenthal, D. (1957). Parental diagnosis of
stuttering in young children. Journal of Speech and
Hearing Disorders, 22, 288–295.
Ingham, R. J. (1984). Stuttering and behavior therapy:
Current status and experimental foundations. San Diego,
CA: College-Hill.
Ingham, R. J., Gow, M., & Costello, J. (1985). Stuttering
and speech naturalness: Some additional data. Journal of
Speech and Hearing Disorders, 50, 217–219.
Ingham, R. J., & Onslow, M. (1985). Measurement and
modification of speech naturalness during stuttering
Journal of Speech, Language, and Hearing Research
Downloaded From: http://jslhr.pubs.asha.org/ by University of California, Santa Barbara, Roger Ingham on 04/16/2014
875
Finn et al.: Assessment of Children Recovered From Stuttering
therapy. Journal of Speech and Hearing Disorders, 50,
261–281.
Ingham, R. J., & Packman, A. (1978). Perceptual assessment of normalcy of speech following therapy. Journal of
Speech and Hearing Research, 21, 63–73.
Johnson, W., & Associates. (1959). The onset of stuttering.
Minneapolis: University of Minnesota Press.
treated stutterers. Journal of Fluency Disorders, 4, 29–38.
Runyan, C. M., Bell, J. N., & Prosek, R. A. (1990). Speech
naturalness ratings of treated stutterers. Journal of
Speech and Hearing Disorders, 55, 434–438.
Runyan, C. M., Hames, P. E., & Prosek, R. A. (1982). A
perceptual comparison between paired stimulus and
single stimulus methods of the fluent utterances of
stutterers. Journal of Fluency Disorders, 7, 71–77.
Krikorian, C., & Runyan, C. (1983). A perceptual comparison: Stuttering and nonstuttering children’s nonstuttered
speech. Journal of Fluency Disorders, 8, 283–290.
Van Riper, C. (1982). The nature of stuttering (2nd ed.).
Englewood Cliffs, NJ: Prentice-Hall.
Martin, R. R., & Haroldson, S. K. (1992). Stuttering and
speech naturalness—Audio and audiovisual judgments.
Journal of Speech and Hearing Research, 35, 521–528.
Wendahl, R. W., & Cole, J. (1961). Identification of
stuttering during relatively fluent speech. Journal of
Speech and Hearing Research, 4, 281–286.
Martin, R. R., Haroldson, S. K., & Triden, K. (1984).
Stuttering and speech naturalness. Journal of Speech and
Hearing Disorders, 49, 53–58.
Wingate, M. E. (1976). Stuttering: Theory and treatment.
New York: Irvington.
Metz, D. E., Schiavetti, N., & Sacco, P. R. (1990). Acoustic
and psychophysical dimensions of the perceived speech
naturalness of stutterers and posttreatment stutterers.
Journal of Speech and Hearing Disorders, 55, 516–525.
Onslow, M., Adams, R., & Ingham, R. J. (1992). Reliability of speech naturalness ratings of stuttered speech
during treatment. Journal of Speech and Hearing
Research, 35, 994–1001.
Onslow, M., Andrews, C., & Lincoln, M. (1994). A control/
experimental trial of an operant treatment for early
stuttering. Journal of Speech and Hearing Research, 37,
1244–1259.
Onslow, M., Costa, L., & Rue, S. (1990). Direct early
intervention with stuttering: Some preliminary data.
Journal of Speech and Hearing Disorders, 55, 405–416.
Rafaat, S. K., Rvachew, S., & Russell, R. S. C. (1995).
Reliability of clinician judgments of severity of phonological impairment. American Journal of Speech-Language
Pathology, 4, 39–45.
Runyan, C., & Adams, M. R. (1978). Perceptual study of
“successfully therapeutized” stutterers. Journal of Fluency
Disorders, 3, 25–39.
Runyan, C., & Adams, M. R. (1979). Unsophisticated
judges’ perceptual evaluations of the speech of successfully
Yairi, E., & Ambrose, N. (1992). A longitudinal study of
stuttering in children: A preliminary report. Journal of
Speech and Hearing Research, 35, 755–760.
Yairi, E., Ambrose, N., & Niermann R. (1993). The early
months of stuttering: A developmental study. Journal of
Speech and Hearing Research, 36, 521–528.
Yairi, E., Ambrose, N., Paden, E., & Throneburg, R.
(1996). Predictive factors of persistence and recovery:
Pathways of childhood stuttering. Journal of Communication Disorders, 29, 51–77.
Young, M. A. (1964). Identification of stutterers from
recorded samples of their “fluent” speech. Journal of
Speech and Hearing Research, 7, 302–303.
Young, M. A. (1994). Evaluating differences between
stuttering and nonstuttering speakers: The group
difference design. Journal of Speech and Hearing Research, 37, 522–534.
Received August 28, 1996
Accepted January 23, 1997
Contact author: Patrick Finn, PhD, Department of Speech
and Hearing Sciences, 901 Vassar NE, University of New
Mexico, Albuquerque, NM 87131. Email:
[email protected]
Journal of Speech, Language, and Hearing Research
Downloaded From: http://jslhr.pubs.asha.org/ by University of California, Santa Barbara, Roger Ingham on 04/16/2014
876
JSLHR, Volume 40, 867–876, August 1997
Appendix. Individual data for RS and NS speakers based on the scores of unsophisticated and
experienced judges.
Speaker
type
Unsophisticated judges
Experienced judges
Unsophisticated judges
Experienced judges
Percent
Mean
of never
speechstuttered naturalness
judgments
ratings
Percent
Mean
of never
speechstuttered naturalness
judgments
ratings
Percent
Mean
of never
speechstuttered naturalness
judgments
ratings
Percent
Mean
of never
speechstuttered naturalness
judgments
ratings
NS
RS
1
2
3
4
5
6
7
8
9
10
Mean
Speaker
type
80.77
80.77
88.46
61.54
50.00
61.54
42.31
80.77
88.46
88.46
72.30
2.92
4.42
2.65
6.88
5.12
3.84
3.96
5.08
3.77
3.77
4.24
92.86
78.57
78.57
35.71
42.86
35.71
50.00
92.86
78.57
100.00
68.60
3.43
2.57
2.64
6.93
4.79
3.50
3.43
3.50
3.07
3.21
3.71
1
2
3
4
5
6
7
8
9
10
Mean
96.15
88.46
76.92
73.08
61.54
76.92
84.62
57.69
46.15
65.38
72.70
Journal of Speech, Language, and Hearing Research
Downloaded From: http://jslhr.pubs.asha.org/ by University of California, Santa Barbara, Roger Ingham on 04/16/2014
2.12
3.77
2.08
5.19
4.54
4.46
3.77
4.73
3.27
4.23
3.82
92.86
64.29
71.43
64.29
71.43
85.71
100.00
50.00
57.14
71.43
72.90
1.64
4.14
2.50
4.71
3.79
2.14
2.71
4.00
3.29
3.50
3.24