70
JCPSLP
Volume 14, Number 2 2012
Journal of Clinical Practice in Speech-Language Pathology
adverbs, adjectives). Eleven of the Top 20 words, and
64 of the Top 100 words were the same across the two
modalities.
Differences were found across the two datasets with
words like
yeah
and
just
ranking much higher in the spoken
samples than the written, and words like
today
and
birthday
ranking higher in the written samples than the spoken. The
2- and 3-word sequences also revealed differences with
sequences like
I am going and I went to
ranking higher in
the written samples than the spoken, and sequences like
I
don’t know
and
And that’s all
ranking higher in the spoken
samples than the written.
Age-comparisons using the Top 100 word lists for the
two modalities revealed that the overlap was greatest for
the 5-year-old children with 42 words shared across both
word lists. The overlap for the 6-year-old children was 34
words, and for the 7-year-old children it was 33 words.
Implications and future directions
This study highlighted the similarities and differences in
spoken and written vocabulary use in typically developing
NZ children. The word lists generated can be used to
support the face-to-face and written communication
development of NZ children who use AAC. These word lists
are particularly relevant for children in the first three years of
formal schooling as it is likely that the vocabulary used
Analysis and results
The analyses were conducted using the Child Language
Analysis (CLAN, MacWhinney, 2009) program. The total
number of words (TNW) and total number of different words
(TNDW) are presented in Table 1. As shown in Table 1, the
vocabulary used in the written samples was more diverse
(type token ratio; TTR = 0.10), than that used in the spoken
samples (TTR = 0.06). In both datasets, the most frequently
occurring words accounted for a large proportion of the
total words produced. As shown in Table 2, the proportions
accounted for by the most frequently occurring 10, 50, and
100 words were very similar across the two datasets.
Table 1. Summary statistics for written and spoken
samples
Measure
Written samples Spoken samples
Total number of words
27,643
109,710
Total number of words /
23
508
number of samples
Total number of different
2799
6052
words
Type token ratio (TTR)
0.10
0.06
Note: TTR = Total number of words / Total number of words
Table 2. Proportion of total words represented by
most frequently occurring 10, 50, and 100 words
Word list
Proportion of total number of words
Written samples Spoken samples
Most frequently occurring
27%
32%
10 words
Most frequently occurring
52%
56%
50 words
Most frequently occurring
64%
66%
100 words
Tables 3, 4, and 5 outline the 20 most frequently
occurring words, 2-word sequences, and 3-word
sequences. The words marked with an asterisk occurred
in the ‘Top 20’ lists for both written and spoken datasets.
Eight of the Top 10 words were the same across datasets.
These words were:
I, and, the, to, a, my, it,
and
we
. These
were all structure words (pronouns, articles, conjunctions,
prepositions) as opposed to content words (nouns, verbs,
Table 3. Twenty most frequently occurring words
Written samples
Spoken samples
1. I*
11. on*
1. and*
11. you
2. and*
12. was*
2. the*
12. got
3. the*
13. am
3. I*
13. that
4. to*
14. went
4. a*
14. one
5. a*
15. are
5. to*
15. then
6. my*
16. in*
6. it*
16. of
7. is
17. she
7. my*
17. he
8. it*
18. have
8. we*
18. because
9. we*
19. me
9. in*
19. was*
10. going
20. like
10. on*
20. yeah
Note. * Word occurred in Top 20 for both spoken and written modalities.
The Top 100 Word List can be obtained by contacting the first author.
Table 4. Twenty most frequently occurring 2-word
sequences
Written samples
Spoken samples
1. going to
11. in the*
1. and then 11. don’t know
2. I am
12. it is
2. and I*
12. and it
3. and I*
13. she is
3. on the*
13. to the*
4. went to
14. are going 4. and we*
14. and he
5. I went
15. I got
5. in the*
15. and you
6. am going
16. on the*
6. and my* 16. my mum
7. to the*
17. I have
7. I don’t
17. have to
8. it was*
18. and she
8. and the 18. go to
9. we are
19. and we*
9. it was*
19. and they
10. I like*
20. and my* 10. got a
20. I like*
Note. * Two-word sequence occurred in Top 20 for both spoken and
written modalities.
Table 5. Twenty most frequently occurring 3-word
sequences
Written samples
Spoken samples
1. I am going 11. going to the 1. I don’t know 11. and my mum
2. am going to 12. I have a
2. and then we 12. and my dad
3. I went to 13. and I got*
3. you have to 13. go to the
4. are going to 14. and it was*
4. and then I
14. and then he
5. we are going 15. and we are 5. we had to 15. play with my
6. went to the 16. going to play 6. and that’s all 16. and we had
7. is going to 17. I had a
7. and I got* 17. play on the
8. and she is 18. it was fun
8. and then you18. to go to
9. I got a
19. and I am 9. we went to 19. I’ve got a
10. going to have 20. in the weekend 10. and it was* 20. when I was
Note. *Three-word sequence occurred in Top 20 for both spoken and
written modalities