The Development of Scoring Criteria for a New Picture Naming Task

Print Friendly, PDF & Email

Authors: Ferzin MahavaChristine SheppardLaura MonettaVanessa Taler


Objective: The purpose of the study was to develop a scoring system for a novel naming task suitable for assessing naming performance in younger (18-30 years) and older (65+ years) adults in monolingual English, monolingual French, and English-French bilingual groups. This novel naming task will serve as an important health service to help diagnose and assess cognitively impaired older individuals, while also serving as an educational tool for healthcare providers.

Materials and Methods: The Naming Task consists of 120 images organized in the same randomized order, and are shown on a white background displayed on a computer screen using PowerPoint. Participants are instructed to name the image displayed. Monolinguals completed the test in their native language and bilinguals completed the test in English only, French only, and a bilingual administration. Scoring criteria was established based on the responses from testing.

Results: Strict and lenient scoring criteria developed for the Naming Task are presented. Eight items were removed from the original Naming Task due to quality and/or clarity, inability to name the image, or too many alternate responses. Performance in mono-lingual English and French was similar in younger and older adults for strict and lenient scoring. Bilinguals performed better with bilingual administration and worse with French administration, where scores were the lowest of all age and language groups.

Conclusion: The Naming Task appears to be suitable for monolingual French and English individuals. Results suggest that a bilingual administration should be used when testing English-French bilinguals.


Despite the overwhelming increase of bilingualism in Cana-da, there are no appropriate tools to assess language abilities in older English-French bilingual speakers. A new Naming Task will serve as a tool for healthcare providers to assess naming abilities in bilingual adults. This may be important when assessing older adults for medical conditions that impact language abilities, such as dementia and aphasia. The purpose of the present study is to develop a scoring system for a novel naming task that is suitable for assessing naming performance in monolingual English, monolingual French, and English-French bilinguals. Upon scoring criteria development, this novel naming task will serve as an important health service to help diagnose and assess cognitively impaired older individuals.

Two types of scoring criteria were developed for the Naming Task: strict and lenient scores. Strict scores represented the formal name for an item, while lenient scores included acceptable synonyms or slang terms. The analysis presented in this paper will determine which names are used the most often for each item and establish a clear set of guidelines for strict and lenient scoring in both English and French. Performance across groups will be compared on the strict and lenient scoring criteria, in order to examine the impact of language administration on bilingual performance and to determine if the test is suitable for all language groups.

Literature Review

In the recent decade, research has begun exploring the impact of bilingualism on cognition, especially in the areas of executive function and language. This research has demonstrated that, relative to monolinguals, bilingual individuals show superior performance on tasks of executive function (e.g., inhibition of task-irrelevant information) (Adesope, Lavin, Thompson, & Ungerleider, 2010; Bialystok, 2009; Bialystok, Craik, Green, & Gollan, 2009), but poorer performance on language tasks (e.g., picture naming tasks) (Gollan, Montoya, Fennema-Notestine, & Morris, 2005; Roberts, Garcia, Desrochers, & Hernandez, 2002). In addition, bilingualism can be seen as a protective factor, as research with an immigrant sample living in Toronto has suggested that bilingualism may delay the onset of dementia by five years in older adults (Bialystok, Craik, & Freedman, 2007; Craik, Bialystok, & Freedman, 2010).

The Boston Naming Test (BNT) is a widely used clinical picture-naming task, where patients are asked to name the image displayed (Kaplan, Goodglass, & Weintraub, 1983). Overall, individuals show a decline in naming ability as they age (Kaplan et al., 1983), specifically after the age of 70 (Brouillette et al., 2011). Research examining the utility of the BNT with bilinguals has shown that monolinguals tend to outperform bilinguals and the level of difficulty for the test likely differs between languages (Roberts et al., 2002). For example, in a study comparing English-speaking monolinguals, bilingual Spanish-English speakers, and bilingual English-French speakers, both bilingual groups scored significantly worse than the monolingual English participants (Roberts et al., 2002). Furthermore, bilinguals have demonstrated difficulty with verbal fluency, frequent tip-of-the-tongue states, and longer picture naming latencies (Bialystok, 2009), even when completing the task in their dominant language (Gollan & Acenas, 2004). Additional studies have indicated that bilinguals perform worse on naming tasks such as the BNT, both in measures of accuracy (Bialystok, Craik, & Luk, 2008; Kohnert, Hernandez, & Bates, 1998) and response time (Gollan et al., 2005; Gollan, Fennema-Notestine, Montoya, & Jernigan, 2007; Ivanova & Costa, 2008; Roberts et al., 2002).

Research with French Canadians suggests that the French translation of the BNT does not account for cultural appropriateness, which is important when administering the test in a language other than the one in which it was originally developed (Roberts & Doucet, 2011). Specifically, research suggests that the French translation of the BNT is not acceptable for assessing naming abilities in English-French bilinguals or in monolingual French individuals (Roberts & Doucet, 2011; Sheppard, Kousaie, Monetta, & Taler, 2016). It has been suggested that when there is a large inconsistency in naming certain items, these items should be removed or the items should be changed in their order of difficulty (Roberts & Doucet, 2011). For example, research with older adults from Quebec City indicated that there were 13 BNT items with multiple acceptable synonyms (e.g., “seahorse” can either be “hippocampe” or “cheval de mer”) and an additional six items that had no clear acceptable response (e.g., “globe”), as native speakers in French disagree on the name of the item (Roberts & Doucet, 2011). Additional research comparing monolingual English and French speakers to English-French bilinguals on the BNT demonstrated that a French administration of the task consistently yielded poorer scores, even in the French monolingual group (Sheppard et al., 2016). Furthermore, after matching for underlying naming ability, differential item functioning analyses suggested that a significant number of items functioned differently across the three participant groups and in different languages of administration (Sheppard et al., 2016), suggesting that the BNT is not equivalent in English and French.

Materials and Methods


Six groups of participants were included in this study: younger (n = 44) and older (n = 64) monolingual-English speakers, younger (n=30) and older (n = 30) monolingual-French speakers, and younger (n = 48) and older (n = 52) bilingual English-French speakers. Young adults were aged 18 to 30 and older adults were aged 65 or older. Monolin-gual English participants and bilingual English-French par-ticipants were recruited and tested in the Ottawa-Gatineau region, while monolingual French speakers were recruited and tested in Quebec City. Younger adults were recruited through word of mouth and local undergraduate popula-tions, while older adults were recruited through advertise-ments in community centres, grocery stores, and newspa-pers. Monolingual participants had either limited or no ex-posure to languages other than their native language. Bilin-guals had limited exposure to languages other than French and English. All bilingual participants were proficient in both English and French before the age of 13 and self-reported their proficiency in French and English using a 5-point Likert scale (see Table 1) on measures of auditory comprehension, reading, speaking, and writing.


Table 1. Mean calculation ± standard deviation of proficiency by modality for both English and French for bilingual younger (n=48) and bilingual older (n=52) participants. Ranking followed a 5-point Likert scale (1 = no ability; 5 = native-like ability).

Naming Task

The Naming Task consists of 120 images, 100 of which were selected from the coloured Snodgrass set (Rossion & Pour-tous, 2004) and the remaining 20 were developed by Dr. Taler, the lead researcher in this study. The Snodgrass im-ages were selected based on their array of difficulty and strong name agreement, while the additional images were created based on the same colour scheme as the Snodgrass set, but with a higher level of naming difficulty. The images were organized in the same randomized order for all partici-pants and were shown on a white background displayed on a computer screen using PowerPoint. Participants were in-structed to identify the image on the screen and the re-search assistant was instructed to record all answers given by the participant.

Neuropsychological Battery

Participants completed a neuropsychological battery, including the forward and backward digit span subtests of the Wechsler Adult Intelligent Scale-Third Edition (Wechsler, 1997); the Montreal Cognitive Assessment (Nasreddine et al., 2005); a version of the Stroop colour-word interference test (Stroop, 1935) in which the number of items produced in 45 seconds was recorded in each of the three conditions (word reading, color naming, and incongruent colour naming); the 64-item Wisconsin Card Sorting Test (Grant & Berg, 1948); and category (animal) and letter (FAS) verbal fluencies (Benton & Hamsher, 1976). Monolingual participants completed the verbal fluency tasks in their native language and bilingual participants completed the tasks in English, in French, and in an administration where they could respond in either language. The neuropsychological battery was administered to demonstrate that all study participants had normal cognitive function. See Table 2 for demographics and neuropsychological performance across all groups.


Table 2. Demographic and neuropsychological performance by participant group (mean ± standard deviation). Verbal fluency scores for bilingual groups are reported where participants could answer in either language. MoCA = Montreal Cognitive Assessment; Digit Span = Wechsler Adult Intelligent Scale-Third Edition; WCST= 64-item Wisconsin Card Sorting Test; FAS = letter verbal fluencies; Animals = category verbal fluencies.


All monolingual participants completed the testing in one session of two hours, while bilingual participants completed the testing in two sessions of two hours each. All bilingual participants completed the Naming Task in three administrations: English only, French only, and either-language where they could respond in either English or French. Two language administrations were completed in the first testing session, while the third administration was completed in the second testing session.

The study procedures adhered to federal guidelines for protection of human research participants and received ethical approval from the Research Ethics Board at the Bruyère Research Institute, Laval University, and the University of Ottawa. Participants were remunerated $10/ hour for all testing completed and provided informed consent prior to participating.

Development of Scoring Criteria

Dr. Taler developed preliminary scoring criteria for the Naming Task in English and French; these scoring criteria formed the basis of the strict and lenient scoring protocol that was developed for this study. First, the data from each participant were scored based on the preliminary scoring criteria, wherein one point was awarded for each correct answer. Percentages were then calculated for each image based on the number of participants who named the image correctly. During this process, alternative answers provided by participants were recorded. Two independent reviewers went through each item to determine the strict and lenient scoring criteria. The strict scoring criteria were selected based on the most frequent response provided by participants (i.e., a minimum of 50%) and/or the most formal or known name used in society. Lenient responses were selected based on synonyms (e.g., “ironing board” vs. “ironing table”), clarity of the image (e.g., “violin” vs. “viola”), culturally relevant slang terms (e.g., “baby carriage” vs. “pram”), and shortened names for the image (e.g., “green pepper” vs. “pepper”). The two independent researchers then met to discuss their findings. Discrepancies were resolved through discussion and all established scoring criteria were verified by three additional researchers. See Appendix A for a list of strict and lenient responses for each item.


Items Recommended for Removal

Eight items were recommended for removal in English and French: stirrup, gavel, beetle, barn, blouse, and flute were removed due to the clarity and/or quality of the image; rickshaw was removed because no younger or older monolingual French participants could name the image; and necklace was removed as there were too many alternative names for these image (e.g., for necklace: “pearls”, “string of pearls”, “pearl necklace”, and “necklace”).

Overall Task Performance

Figures 1 and 2 present an overall summary of task performance by age and language group according to strict and lenient scoring criteria. The largest difference in naming abilities between older and younger adults is seen in the bilingual French administration groups. Overall, older adults performed better than younger adults in all language categories. The only group where younger participants scored higher than older participants was the monolingual French group, and younger participants scored an average of one item higher (strict and lenient).

Figure 1

Figure 1. Average number of images named under strict scoring criteria by age and language group.

Figure 2

Figure 2. Average number of images named under lenient scoring criteria by age and language group.

For both younger and older adult groups, monolingual English participants had the highest overall score across the task, ranging from an average of 99 correct items using strict scoring and 106 correct items using lenient scoring, out of 120 items. Bilingual English-French participants were able to correctly name an average of 92 and 94 (strict and lenient scoring, respectively) of the items when completing the test in English; however, this increased to 95 and 102 (strict and lenient scoring, respectively) when responses were accepted in either language. The majority of bilingual participants in the bilingual administration responded in English (i.e., 52% of older adults and 62% of younger adults). The average number of items named correctly did not improve by more than five items in any group when lenient scoring was added.

Results by Item

Table 3 represents the percentage of participants who correctly identified each item under strict and lenient scoring.


Table 3. Percentage of correct item responses for strict and lenient scoring for participants in monolingual and bilingual groups. ME= Monolingual English; MF = Monolingual French; YA = Younger adults; OA = Older adults; St= strict; Len= lenient; Eng = English Administration; Fre = French Administration; Bil = Bilingual Administration.

Analysis 1: Strict and Lenient Scoring Differences. There were a number of items where performance improved by one to five extra items once lenient criteria was taken into consideration. The following is a list of items where percentages improved once lenient scoring was included, in both English and French for all language groups: spool of thread, ottoman, candelabra, leopard, eagle, ironing board, bow, coat, and salt shaker. Additionally, there were a number of items that scored higher once lenient scores were included in English only: grasshopper, record player, beetle, light switch, mitten, colander, and sled; and in French only: hippocampe, truelle, and poivron.

Analysis 2: Language Group Differences. Bilingual participants performed more poorly on the task than monolingual participants in their respective languages. The difference was most extreme when comparing the monolingual French participants and the bilingual-French administration. While there was a similar pattern of results shown with the monolingual English participants and the bilingual-English administration, the performance differences were not as great (i.e., smaller difference between groups) or consistent (i.e., not as many items displaying group differences). It should be noted that there are a small number of items where bilingual English-French speakers scored better than the monolingual groups. In English, these items include cannon, celery, and flute. In French, these items include cyclopousse, lèvres, wagon, and bec Bunsen.

Analysis 3: Age Differences. The following is a list of items that had large generational differences, where younger adults scored higher than older adults: necklace, centaur, stroller, gorilla, tambourine, trumpet, and racoon. However, overall, older adults scored higher than younger adults in all languages and language administration groups.


The purpose of this study was to develop scoring criteria for a new bilingual naming task, as it will serve as an important health service for cognitively impaired older adults. Older and younger participants were tested using a preliminary scoring criteria to determine if the test was appropriate for both English- and French-speaking individuals. Although the task can easily be administered to all groups, there are differences in how each group of participants performs based on their age group, language group, and for the bilingual participants, language of administration.

Allowing lenient scoring to be considered did improve the average number of correct responses by one to five items per group, with most groups improving by two items. An advantage to having both strict and lenient scoring criteria is that poorer performance on certain items is more likely to be related to item difficulty or language difficulty, as the lenient criteria takes into consideration acceptable synonyms, culturally relevant slang terms, and shortened names for the item. Adding lenient scoring improves the quality of the Naming Task because it demonstrates that although participants may not use the formal name for the item, they still know what the image is representing and can name the item using terms they are familiar with. Some items (e.g., cheetah and leopard) were given two strict scores because this image was very representative for both names, and participants may not be able to accurately distinguish a difference. Some items (e.g., necklace) were removed because there were too many possible responses, making it difficult to score the item.

Based on the quality of the image, a number of items were recommended for removal. Removal criteria was determined based on the responses provided by the participants, indicating that these items were ambiguous, and thus not a good visual representation of the item in question. Furthermore, additional items were recommended for removal as they had a large number of alternate names, making it difficult to score.

There were also large language group differences, with monolingual English participants outperforming every other language group, and the bilingual French administration group performing the most poorly of all the groups. Interestingly, the monolingual French group vastly outperformed the bilinguals in the French administration. This difference might be related to the fact that the bilingual participants were selected from the Ottawa region, which is largely English-dominant. Even though all of the bilinguals had good self-reported proficiency in both languages, the environment in which they live and work may be more English-dominant than would be expected for bilinguals in Quebec City, where monolingual French participants were selected and tested.

Finally, there were a number of items where older adults outperformed the younger adults. This finding could be attributed to generational differences (Schmitter-Edgecombe, Vesneski, & Jones, 2000), or the idea that older adults may have a greater vocabulary (Hawkins et al., 1993; Sheppard et al., 2016). There may have been a number of items that older adults, but not younger ones, have been exposed to, explaining the difference between age groups (e.g., metronome). The items where there was a very large difference between older and younger adults were not necessarily recommended for removal; however, further analysis of these items is required to determine if the generational differences are significant enough to alter the results of the test for future participants.

Future research should seek to understand why certain language groups, primarily monolingual English individuals, outperform others, and to determine how these discrepancies can be resolved to allow for the Naming Task to serve as an appropriate tool for bilingual older adults. More analysis is required to determine which images should be removed as a consequence of the inequality between language groups and age groups. Research should further focus on data collection with monolingual and bilingual patients with mild cognitive impairment conditions and Alzheimer’s disease, to test the validity of the scoring criteria.


The present study established strict and lenient scoring criteria for an English-French picture-naming task. The Naming Task will serve as a health service for both English and French individuals to assess cognitive impairment and can be used as a suitable alternative to the BNT. The Naming Task appears to be suitable for monolingual French and English individuals. However, results are unclear when comparing bilingual to monolingual participants. Results suggest that when possible, a bilingual administration should be used when testing English-French speaking individuals, as responses will be stated in the participant’s dominant language, which is affected by their language environment.


This research was supported by an Alzheimer Society of Canada Research Grant awarded to Vanessa Taler, Laura Monetta, and Shanna Kousaie (Grant #1423). The authors declare no conflict of interest. We would like to thank Linda Garcia for co-supervising this honours project. We would also like to thank Julien Blacklock, Chloe Corbeil, Dominique Fijal, Laura Thompson, Chalice Walker, Anne-Marie Lavoie, and Maude Lemieux for their assistance with data collection, as well as Jihan Nassrallah for her assistance in developing the strict and lenient scoring criteria.


Adesope, O. O., Lavin, T., Thompson, T., & Ungerleider, C. (2010). A systematic review and meta-analysis of the cognitive correlates of bilingualism. Review of Educational Research, 80(2), 207-245. doi:10.3102/0034654310368803

Benton, A. L., & Hamsher, K. (1976). Multlingual aphasia examination manual. Iowa, IA: University of Iowa.

Bialystok, E. (2009). Bilingualism: The good, the bad, and the indifferent. Bilingualism: Language and Cognition, 12(1), 3-11. doi:10.1017/S1366728908003477

Bialystok, E., Craik, F. I., & Freedman, M. (2007). Bilingualism as a protection against the onset of symptoms of dementia. Neuropsychologia, 45(2), 459-464. doi:10.1016/j.neuropsychologia.2006.10.009

Bialystok, E., Craik, F. I., Green, D. W., & Gollan, T. H. (2009). Bilingual minds. Psychological Science in the Public Interest, 10(3), 89-129. doi:10.1177/1529100610387084

Bialystok, E., Craik, F., & Luk, G. (2008). Cognitive control and lexical access in younger and older bilinguals. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34(4), 859-873. doi:10.1037/0278-7393.34.4.859

Brouillette, R. M., Martin, C. K., Correa, J. B., Davis, A. B., Han, H., Johnson, W. D., … & Keller, J. N. (2011). Memory for names test provides a useful confrontational naming task for aging and continuum of dementia. Journal of Alzheimer’s Disease, 23(4), 665-671. doi:10.3233/JAD-2011-101455

Craik, F. I., Bialystok, E., & Freedman, M. (2010). Delaying the onset of Alzheimer disease: Bilingualism as a form of cognitive reserve. Neurology, 75(19), 1726-1729. doi:10.1212/WNL.0b013e3181fc2a1c

Gollan, T. H., & Acenas, L. A. (2004). What is a TOT? Cognate and translation effects on tip-of-the-tongue states in Spanish-English and tagalog-English bilinguals. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30(1), 246-269. doi:10.1037/0278-7393.30.1.246

Gollan, T. H., Fennema-Notestine, C., Montoya, R. I., & Jernigan, T. L. (2007). The bilingual effect on Boston Naming Test performance. Journal of the International Neuropsychological Society, 13(2), 197-208. doi:10.1017/S1355617707070038

Gollan, T. H., Montoya, R. I., Fennema-Notestine, C., & Morris, S. K. (2005). Bilingualism affects picture naming but not picture classification. Memory & Cognition, 33(7), 1220-1234. doi:10.3758/BF03193224

Grant, D. A., & Berg, E. A. (1948). A behavioural analysis of degree of reinforcement and ease of shifting to new responses in a Weigl-type card sorting problem. Journal of Experimental Psychology, 38(4), 404-411. doi:10.1037/h0059831

Hawkins, K. A., Sledge, W. H., Orleans, J. F., Quinlan, D. M., Rakfeldt, J., & Huffman, R. E. (1993). Normative implications of the relationship between reading vocabulary and Boston Naming Test performance. Archives of Clinical Neuropsychology, 8(6), 525-537. doi:10.1093/arclin/8.6.525

Ivanova, I., & Costa, A. (2008). Does bilingualism hamper lexical access in speech production? Acta Psychologica, 127(2), 277-288. doi:10.1016/j.actpsy.2007.06.003

Kaplan, E., Goodglass, H., & Weintraub, S. (1983). Boston Naming Test. Philadelphia, PA: Lea & Febiger.

Kohnert, K. J., Hernandez, A. E., & Bates, E. (1998). Bilingual performance on the Boston Naming Test: Preliminary norms in Spanish and English. Brain and Language, 65(3), 422-440. doi:10.1006/brln.1998.2001

Nasreddine, Z. S., Phillips, N. A., Bédirian, V., Charbonneau, S., Whitehead, V., Collin, I., . . . Chertkow, H. (2005). The Montreal Cognitive Assessment, MoCA: A brief screening tool for mild cognitive impairment. Journal of the American Geriatrics Society, 53(4), 695-699. doi:10.1111/j.1532-5415.2005.53221.x

Roberts, P. M., & Doucet, N. (2011). Performance of French-speaking Quebec adults on the Boston Naming Test. Canadian Journal of Speech-Language Pathology and Audiology, 35(3), 254-267.

Roberts, P. M., Garcia, L. J., Desrochers, A., & Hernandez, D. (2002). English performance of proficient bilingual adults on the Boston Naming Test. Aphasiology, 16(4-6), 635-645. doi:10.1080/02687030244000220

Rossion, B., & Pourtois, G. (2004). Revisiting Snodgrass and Vanderwart’s object pictorial set: The role of surface detail in basic-level object recognition. Perception, 33(2), 217-236. doi:10.1068/p5117

Schmitter-Edgecombe, M., Vesneski, M., & Jones, D. W. (2000). Aging and word-finding: A comparison of spontaneous and constrained naming tests. Archives of Clinical Neuropsychology, 15(6), 479-493. doi:10.1016/S0887-6177(99)00039-6

Sheppard, C., Kousaie, S., Monetta, L., & Taler, V. (2016). Performance on the Boston Naming Test in bilinguals. Journal of the International Neuropsychological Society, 22(3), 350-363. doi:10.1017/S135561771500123X

Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18(6), 643-662. doi:10.1037/h0054651

Wechsler, D. (1997). Wechsler Adult Intelligence Scale – Third Edition. San Antonio, TX: The Psychological Corporation. 


Appendix A. English and French Strict and Lenient Scoring Criteria


Commentez / Comment: