Validation of Korean Version of the Oxford Cognitive Screen (K-OCS), a Post Stroke-Specific Cognitive Screening Tool
Article information
Abstract
Objective
To establish and evaluate the validity of the recently developed Korean version of the Oxford Cognitive Screen (K-OCS), this study verified its reliability, validity, and diagnostic accuracy.
Methods
Between November 2021 and December 2023, we recruited 72 patients with stroke from our hospital who agreed to participate in the study. The patients were repeatedly tested using K-OCS by the same or different assessors to estimate inter- and intra-rater reliability. To demonstrate the validity and usability of K-OCS, the test results of screening tools currently used in clinical practice, including the Korean-Mini Mental State Examination and the Korean version of the Montreal Cognitive Assessment, were used in comparison analyses.
Results
The subtests of K-OCS demonstrated excellent inter-rater reliability (intra-class correlation coefficient [ICC]=0.914–0.998) and test–retest reliability (ICC=0.913–0.994). We found moderate-to-strong correlations for convergent validity for the subsets (r=0.378–0.979, p<0.01), and low-to-moderate discriminant validity correlations. The optimal cut-offs estimated for the subtests of the K-OCS showed a good-to-high range of specificity (94.8%–100%). The positive predictive value was 58.2%–100% and negative predictive value was 65.6%–98.4%. Sensitivity was estimated at 25.6%–86.9%.
Conclusion
The results of this study indicate that K-OCS is a reliable and valid tool for screening cognitive impairment in patients post-stroke.
INTRODUCTION
More than 70% of the patients with stroke experience post-stroke cognitive impairment (PSCI), depending on the type of stroke, definition, and time point of assessment [1,2], and 32% of the survivors experience persistent cognitive decline up to 3 years after stroke [3]. Assessment of potential PSCI is important, as the phenomenon may be reversible in the early phase and presence of co-morbid conditions may affect cognition [4]. Numerous international clinical treatment guidelines for stroke recommend that survivors be routinely assessed for cognitive impairment using cognitive screening tools, followed by a comprehensive assessment according to the results [5,6]. In terms of measuring cognitive impairment after stroke, improvements in the screening tools are required to reflect better insights into the disease pathophysiology and symptoms [7]. The Mini-Mental State Examination (MMSE) [8] and Montreal Cognitive Assessment (MoCA) [9] have been developed to screen for dementia in general and are currently being used to evaluate post-stroke cognitive function, but their application in the field of stroke care faces numerous challenges. As a representative example, difficulties may emerge after lesions in the brain language domain owing to limitations in the expressive or receptive language abilities of the patients with stroke [10,11]. In practical terms, approximately 20% of stroke patients are unable to undergo dementia screening [12].
The Oxford Cognitive Screen (OCS) [13] addresses cognitive functioning in a comprehensive manner across several domains to identify impairments beyond the limitations of assessment from language, spatial-temporal, memory, or executive functions commonly affected by stroke. OCS employs a multiple-choice format to enhance the participation of patients with language impairments and vertical layouts to maximize the evaluation of patients who have neglect symptoms. The tool allows for rapid screening within approximately 15 minutes. Other notable advantages of OCS is applicability at bedside and ease of use for patients with hemiplegia that spares a single upper limb. As for the criteria to determine impairment, the OCS uses 5th-percentile cut-offs. The test’s utility is further enhanced by incorporating a “visual snapshot” (Fig. 1), which facilitates communication of the assessment results. The results in the Korean version (K-OCS) are summarized in a snapshot report that displays scores alongside brief descriptions of each subtask, making it easy to understand the patient’s cognitive performance at a glance. The snapshot report serves not only as a tool to indicate areas of impairment but also to record additional notes. For instance, if a patient experiences difficulties only in reading but performs well on other language tasks, only the relevant segment of the “domain” is shaded and explanatory comments can be provided in the adjacent area (e.g., “Deficit in reading only - score 11/15”). Observation of the behavior and medical condition, such as “T-tube limitations or not wearing reading glasses” or “Excited emotional status,” can also be described to provide additional information.
The initial development of OCS in the United Kingdom was followed by normative studies in various countries, including Italy, Hong Kong, Denmark, Russia, Spain, Portugal, Netherlands, Australia, and Germany [14-22]. Recently, a normative study for K-OCS that considered and adapted cultural and linguistic characteristics has been published (https://process.innovation.ox.ac.uk/clinical/p/ocs/questionnaire/1). The K-OCS was translated and reverse-translated according to a systematic translation procedure in a previous study, adapted to align with the linguistic and cultural characteristics of Korea, and ultimately approved by the original author [23]. As for the criteria to determine impairment, the OCS uses 5th-percentile cut-offs. Detailed information on how to translate and score can be found in the K-OCS normative study [23]. To serve as a useful cognitive assessment tool for stroke, its reliability in patients with stroke must be established. In addition, securing a validity consistent with the results of existing evaluations would be essential. The original OCS has become recognized as a screening test for evaluating PSCIs because of its strong reliability and validity [13], and the latter has been demonstrated in comparison analyses with existing screening tools [10,11]. Additionally, reports have revealed the neurological validity of OCS by demonstrating its ability to detect cognitive impairments characteristic of hemispheric lesions [10].
This study aimed to estimate the feasibility of using K-OCS, which may be a new avenue for post-stroke cognitive screening. For this purpose, we conducted a prospective clinical study and analyzed inter-rater reliability, test–retest reliability, convergent and discriminant validities, specificity, sensitivity, and predictive values, which adopted results of MMSE, MoCA, and other functional evaluation tests.
METHODS
Participant selection
This study enrolled patients with stroke who were hospitalized or visited the outpatient department in a university-affiliated institute, the CHA Bundang Medical Center, for comprehensive rehabilitation from November 2021 to December 2023. The study protocol was approved by the Institutional Review Board of CHA Bundang Medical Center (2021-09-001-001) and registered at clinicaltrials.gov (NCT 06367920). All patients received written and oral information about the study and provided their consent before study participation.
The inclusion criteria were a diagnosis of ischemic or hemorrhagic stroke based on the results from brain magnetic resonance imaging or computed tomography, adults aged 20 years and older, time points of at least 72 hours after stroke onset, and ability to attend for at least 20 minutes, as determined by the medical team. Written informed consent was provided by the patient or a legal representative in case of difficulty in understanding the consent form. The exclusion criteria were a previous history of stroke and pre-existing psychiatric or neurological conditions.
Evaluations for cognition and neurological function
An expert clinical psychologist (Researcher E.C.) and two trained psychology graduate students performed the cognitive evaluations. The assessor was educated on the basic theory underlying OCS, including its purpose, methods to use the tool, and interpretation of test results, through a manual. Simultaneously, the assessors practiced administering the test by watching examination videos provided by the original UK research team (available at https://www.ocs-test.org/ocs-background/how-to). Subsequently, each trained assessor completed five practice sessions using the actual test tools under the supervision of a licensed clinical psychologist (E.C.) and received training in scoring and interpreting the results. The tests were administered to each patient sequentially, with a minimum time gap of 30 min between tests, following the order of the Korean version of the MoCA (K-MoCA) [24], Korean-MMSE (K-MMSE) [25], and K-OCS [23]. We ensured that tasks directly involving memory recall (e.g., word lists) were not performed consecutively, thereby reducing the likelihood of carryover effects. Patient interviews or non-verbal tasks were conducted between assessments to minimize the possibility of unnecessary interference caused by administering similar tests consecutively or by overlapping verbal tasks. The Birmingham Cognitive Screen (BCoS) imitation task [26] was administered by the clinical psychologist for comparison with the result of the K-OCS imitation task. Lastly, to assess general neurological impairments, rehabilitation medicine physicians (Researcher I.L.) examined the patients using the Korean National Institutes of Health Stroke Scale (K-NIHSS) [27].
In this institute, the enrolled patients were periodically evaluated for physical and cognitive function as a routine practice for timely and appropriate rehabilitation. Data generated during the routine clinical work-up, such as test results of the “line bisection test” [28] and “Functional Independence Measurement (FIM)” [29] administered by licensed and trained occupational therapists, were also included for analysis in available cases. Detailed information on the matching of each subset between the tests is provided in Supplementary Table S1.
Procedures for reliability, validity, and diagnostic accuracy analyses of K-OCS
As a first step, the inter-rater reliability of K-OCS was assessed by three raters. The first rater (E.C.) conducted the K-OCS evaluation and recorded the assessment process on video. The second and third raters independently scored the video recordings. Test–retest (intra-rater) reliability was assessed by conducting a re-evaluation within 5 days of the initial assessment.
To assess convergent and discriminant validity correlations, the K-OCS scores were compared to those of conventional cognitive screening tools (K-MMSE and K-MoCA). Evaluation of convergent validity included within-test and between-test assessments. Within-test convergent validity was evaluated by correlating scores from different components of the same assessment tool. Between-test convergent validity involved comparing the scores of the assessment tool with those of established cognitive tests known to measure similar constructs. To confirm discriminant validity, we conducted an analysis to demonstrate a low correlation between OCS and other instruments measuring different constructs. Lastly, diagnostic accuracy was assessed by analyzing the specificity, sensitivity, positive predictive value (PPV), and negative predictive value (NPV). These analyses were performed using the evaluation results of not only K-MoCA and K-MMSE but also of other tests, including the line bisection test, NIHSS, and FIM for all available subtask items.
Measures to assess neuropsychological and functional abilities
K-OCS [23] comprises a test manual, stimulus booklet, evaluator scoring sheet, and examinee assessment sheet. With 10 items, it assesses five cognitive domains: language, attention and executive function, memory, numerical processing, and executive function. K-MoCA [24] was designed to identify mild cognitive impairment in older adults with memory deficits. This assessment comprises 30 points and evaluates spatial-temporal function, frontal lobe executive function, short-term memory, attention, concentration and working memory, language function, and time and place orientation. K-MMSE is an adapted test tailored to the Korean context; it assesses time and place orientation, attention, immediate and delayed recall, constructional ability, and language ability on a 30-point scale [25]. BCoS was developed to identify different forms of praxic deficits using procedures designed to include patients with aphasia and/or spatial neglect [26]. In this study, the results of the BCoS imitation task were compared with those of the OCS imitation task. The line bisection test is a quick measure to detect the presence of unilateral spatial neglect [28]. K-NIHSS [27] is widely used to score severity in acute stroke and allows evaluation of neurological impairment across 10 categories: consciousness, gaze, visual field, strength of face, upper limb, and lower limb, motor control, sensory function, language, articulation, and perception. FIM is a reliable evaluation tool to measure performance in daily life activities in patients with disability using two subscale scores—motor and cognitive [29].
Data and statistical analysis
The collected data were analyzed using IBM SPSS 21 (IBM Corp.) and R 4.1.3. Descriptive analysis was used for the demographic data. Intra-class correlation coefficients (ICCs) were calculated to evaluate the inter- and intra-rater consistency for total, each domain, and each task of the K-OCS assessment.
Using data from K-OCS, K-MMSE, K-MoCA, and other tests, several components were checked to assess the validity of K-OCS. Convergent and discriminant validity correlations between K-OCS and the other tests was assessed using the Spearman correlation coefficient. Items related to diagnostic accuracy in addition to specificity, sensitivity, PPV, and NPV of K-OCS were computed using the reportROC (Receiver Operating Characteristic) package. Using normative data from the K-OCS study [23], impairment cut-off scores were set based on the fifth percentile score for each subtask of K-MoCA. Subtest scores of K-OCS and K-MoCA were compared in the language, memory, numeric ability, and executive function domains. Specificity≥80% confirms effective exclusion of non-impairment cases; sensitivity≥80% indicates strong detection of cognitive impairment; PPV≥70% is reliable in high-prevalence settings; and NPV≥70% ensures accuracy in excluding the condition [30]. Two subtasks were not available for the diagnostic accuracy analysis: the cutoff point of the “recall” subtask in the “memory” domain was 0 and there was no corresponding MoCA subtest for the “heart cancellation” in the “visual attention” domain of OCS.
RESULTS
Collectively, 72 participants were recruited (46 male and 26 female). The ages ranged from 31 to 93 years, with a mean age of 62.35±13.80 years and their education level ranged from 1 to 19 years, with a mean of 13.03±3.60 years. The demographic variables and characteristics related to brain damage in the stroke cohort, including the time elapsed after stroke, are presented in Table 1. The onset of the period was less than 3 months in 22 participants (30.6%), between 3 and 12 months in 16 participants (22.2%), and more than 12 months in 34 participants (47.2%). To evaluate inter-rater reliability, 24 participants were randomly selected and their performances in the K-OCS test were video-recorded. Of these, 17 patients were re-tested for evaluation of intra-rater reliability.
Reliability verification
Three raters participated in the inter-rater reliability analysis using a sample of 24 participants. Reliability is classified as follows: a coefficient value of 0–0.4 indicates poor, 0.4–0.6 fair, 0.6–0.75 good, and 0.75–1 excellent reliability [31]. The results suggested excellent inter-rater reliability for the scoring of most subtests in K-OCS (Table 2). Test–retest reliability was analyzed in a sample of 17 participants with an average interval of 2.88±0.82 days between evaluations. We obtained ICC values between 0.913 and 0.994 (p<0.001), which also demonstrated excellent level of agreement (Table 3).
Validity verification
Data from 72 participants were included in the analysis to verify convergent and discriminant validity correlations. K-OCS and existing tools that assess similar functions demonstrated a moderate-to-strong association (r range, 0.378–0.979, p<0.001; Table 4). We obtained relatively high correlations for K-OCS subscores—imitation (r=0.979), orientation (r=0.849), calculation (r=0.698), and executive mixed tasks (r=0.671). In contrast, relatively low correlations were noted for the executive total (r=|0.386|), recall MoCA (r=0.378), recall MMSE (r=0.430), recognition MoCA (r=0.455), recognition MMSE (r=0.461), episodic memory MoCA (r=0.498), and number writing (r=0.455) tasks (all p<0.01). In the executive total task, a higher total score indicates poorer performance; thus, the correlation is presented as a negative value. To facilitate comparison with findings from other subtests, the values were presented as absolute values, which were then used to compute the r-values; this strategy helps avoiding misinterpretation of negative values that resulted from the reverse scoring system. The results show that correlation is superior for subtasks with content and methods similar to those of existing assessments, while subtasks that highlight the unique features of K-OCS are associated with decreased correlations. As indicated in Supplementary Table S2, the analysis of correlations between cognitive subtests within K-OCS revealed predominantly strong associations, excluding the orientation (r=0.387, p<0.01) and the orientation multiple-choice items (r=0.304, p<0.01). To confirm rationality in the correlation analysis for validity, we chose the FIM motor subscale score as a negative control for discriminant validity. The results mostly revealed no correlation and only a few correlations were noted among the 15 subtasks of K-OCS (Supplementary Table S3).
Diagnostic accuracy
In this study, we assessed diagnostic accuracy by comparing the results between the K-OCS cognitive subtests and a representative selective test—K-MoCA memory subtest—which has been widely used in PSCI screening. Specificity, sensitivity, PPV, and NPV were computed and the corresponding values are presented in Table 5. Specificity in the subtasks ranged from 94.8% to 100%, indicating low false positive probability in K-OCS. Sensitivity was relatively lower for specific memory tasks such as recognition (32.2%), episodic memory (25.6%), and calculations in multiple choice questions (35.2%). The PPV for K-OCS, representing the probability of cognitive impairment based on positive results, was notably high for episodic memory and calculation tasks (100%), whereas executive mixed (68.5%) and picture naming tasks (58.2%) showed relatively lower values (Table 5). The NPV of K-OCS, with negative values indicating that cognitive impairment is likely not present, was significantly high for orientation MCQ (98.4%), picture naming (97.1%), and executive mixed tasks (90.2%). In contrast, recognition (65.6%) and episodic memory tasks (69.5%) had comparatively lower values.
DISCUSSION
The World Health Organization’s Wilson-Jungner criteria, published in 2018, recommend cognitive and emotional screening assessments for all patients with stroke. Additionally, the global focus is on cognitive impairment and rehabilitative interventions after stroke, emphasizing the need to assess function across different cognitive domains [32]. In this study, we reported data on the reliability and validity of K-OCS, which seems appropriate for screening for cognitive impairment in stroke. Verification of the reliability and validity of an assessment tool is crucial for ensuring accuracy and dependability of the measurements in research and clinical practice.
The inter-rater ICC values of K-OCS were 0.914–0.998 across the subset tests, indicating excellent inter-rater reliability [33]. For comparison with prior research results, the Hong Kong-OCS study yielded values of 0.952–1 [15] and the Spanish-OCS study, of 0.790–1 [18]. The test–retest reliability values were 0.906–0.992, which also indicates excellent inter-rater reliability. The original OCS test–retest reliability values were 0.331–0.776 [13], the Hong Kong-OCS study reported values of 0.578–0.989 [15], and the Russian-OCS study, of 0.48–0.96 [17]. There is a notable difference between the present and previous results that showed a significant increase in the re-test score for language [13,15,17]. The average interval between test and retest in this study was 2.88 days, while it was 3.3 days in the original OCS study [13] and sampling was performed at an average of 6 days post-stroke. In contrast, in this study, 30.6% of sampling occurred within 3 months post-stroke. The different condition has likely contributed to the narrower variations in the test–retest scores, particularly in the language domain, in the present study. This is consistent with previous findings, which indicate that patients with acute stroke rapidly recover their language expression abilities early after stroke [34].
Regarding validity, the range of the correlation coefficients between K-OCS subtasks and the corresponding test results of other assessments was |0.386–0.979|, which represents an acceptable level of convergent validity in reference to the values in the original OCS (|0.346–0.902|) [13], Hong Kong-OCS (|0.368–0.732|) [15], Russian-OCS (|0.330–0.950|) [17], and Spanish-OCS (|0.466–0.908|) [18]. The convergent correlations within K-OCS for each subtask value were 0.304–0.676. Similarly, the original OCS study reported values of 0.260–0.680 [13]. Overall, convergent validity appeared somewhat lower than that in other gold-standard neurocognitive assessments. However, according to prior research, MMSE may underestimate non-verbal domains such as memory when evaluating language expression deficits following stroke, and MoCA has also been reported to have limitations in distinguishing between cognitive decline in Alzheimer’s disease and PSCI [35]. The relatively low convergent validity for some K-OCS subtasks may be attributed to the unique ability of this test to measure neuropsychological functions that were indistinguishable with existing cognitive-screening tools. To minimize the impact of aphasia, OCS includes a multiple-choice response format, and to reduce the influence of neglect symptoms, it presents stimuli in vertical layouts. This innovative approach was designed to address the limitations of existing assessment tools by reducing previously recognized confounding effects and enhancing the accuracy of cognitive screening [32,36]. Therefore, compared with existing tools, OCS seems to have enabled a more comprehensive understanding of the intended target component to be measured by enriching insights about the evaluation process.
In this study, discriminant validity assessed the correlation between the FIM-motor task and subtasks in K-OCS; among 15 tasks, we found a moderate degree of correlation in only four (sentence reading, orientation, orientation MCQ, and hearts cancellation). In the original OCS study, subtasks from the Barthel and OCS were used, and correlations were reported only for Broken Hearts [13].
In terms of diagnostic accuracy, relatively low sensitivity scores were observed in certain subtasks, such as recognition (32.2%), episodic memory (25.6%), and the multiple-choice question on calculations (35.2%). These findings may be due to the K-OCS design that differs from traditional assessment tools in aspects that include use of multiple-choice responses and of cued rather than simple recall.
Regarding the study’s limitations, we note that post-injury duration for most of the patients (47.2%) exceeded one year, with 30.6% within three months, and 22.2% between three and 12 months (Table 1). Given that K-OCS is internationally recommended as a screening tool for patients with acute stroke, future research should prioritize validating its use in a large-scale population with acute stroke within the first three months of injury. Another consideration is the absence of a parallel test for K-OCS. Patients with acute stroke often exhibit rapid changes in cognitive function over a short period. Therefore, developing a parallel test with the same content but different items is necessary to enable short-term follow-up. Furthermore, development of parallel forms of testing would be essential to avoid learning effects of repetitive evaluation in patients with rapid recovery.
In conclusion, according to the results of this study, K-OCS overall demonstrated a fair level of validity, supporting its recommendation for clinical use in Korea and conforming to its designation as a stroke-specialized tool with distinct characteristics from those in existing tools. As a tool that is friendly toward common post-stroke symptoms such as aphasia and neglect, K-OCS enables a rapid and comprehensive assessment of patients’ cognitive strengths and weaknesses. Therefore, K-OCS can be applied to patients with stroke in clinical settings, and we expect that future research will continue to validate its clinical utility.
Notes
CONFLICTS OF INTEREST
No potential conflict of interest relevant to this article was reported.
FUNDING INFORMATION
This study was supported by a Special Research Fund for ARM Development of the Korean Academy of Rehabilitation Medicine in 2023, Institute for Information & Communications Technology Planning & Evaluation funded by the Ministry of Science and ICT of Korea (2021-0-00742), the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health and Welfare, Republic of Korea (grant number: HR22C1605).
AUTHOR CONTRIBUTION
Conceptualization: Cho E. Methodology: Cho E. Formal analysis: Cho E. Funding acquisition: Kim MY. Project administration: Cho E, Kim MY, Choi S. Visualization: Cho E. Writing – original draft: Cho E, Kim MY, Lim I, Kim R. Writing – review and editing: Cho E, Kim MY, Choi S, Demeyere N. Approval of final manuscript: all authors.
SUPPLEMENTARY MATERIALS
Supplementary materials can be found via https://doi.org/10.5535/arm.240099.
Supplementary Table S1.
Overview of measures for convergent validity
Supplementary Table S2.
Convergent validity within K-OCS subtasks (n=72)
Supplementary Table S3.
Discriminant validity (n=72)