INTRODUCTION
Neural activity is an electric action, and thus, cerebral responses to stimuli can be measured through brainwaves. In particular, the waveform drawn from cerebral electrical activities processed by repetitive stimuli, including specific information, is named event-related potential (ERP) [
1], which is measured by a non-invasive analysis of electrophysiological phenomena that are caused by stimuli and arise in the cerebral cortex, with the calculation of the average. This potential is known to be generated in a process of cerebral perception or recognition [
2,
3].
Clinically, the ERP can be used in patients with dementia, depression, Parkinson disease, hydrocephalus, schizophrenia, metabolic encephalopathy, and central auditory processing disorder [
4,
5].
The waveform of ERP is made up of various components, e.g., N100, N200, P300, N400, and P600; especially, the most well-known is the P300, positive potential with the average latency of 300 ms, which is known to be useful in assessing the cognitive disorder [
6-
8]. The amplitude of P300 is related to the quantity of nerve resources used to process data, i.e., regeneration of memory, and its latency is a reflection of the time it takes the brain to register the meaning of the stimulus and is considerably more apparent for rare target stimuli than for frequent non-target stimuli [
9].
In 1993, Segalowitz and Barnes [
10] suggested that total variance of an ERP measure can be expressed as a function of 1) trait variance that represents stable subject characteristics, such as age, gender, cognitive capacity, working memory, intelligence, learning disability or dementia; 2) stimulus variance that is systematically manipulated in the experimental paradigm, such as target frequency or intensity; 3) variance due to subject's psychological or physiological state independent of stimulus properties, such as arousal, nicotine, and caffeine use; and 4) measurement error. Among them, state variance and measurement error are critical in determining the basic reliability of the ERP measure itself and present an upper bound on its validity.
In previous studies that use normal population as subjects, P300 latency was highly reliable, but reliability of P300 amplitude showed comparative controversial results. Moreover, test-retest reliability was assessed only twice, not assessed by the multiple repetitive tests [
10-
12].
Our aim of this study is to investigate the intra-rater reliability of the amplitude and latency of P300, confirmed by an agreement of the test score from one occasion to the others if multiple repetitive tests were done in normal young adults. The other aim is to establish them as the reference values in the event that they are found to be highly reliable.
MATERIALS AND METHODS
Subjects
This study was conducted on men and women in their 20s and 30s (age range, 25 to 34 years), who were working at the general hospital as medical doctors without trouble hearing and central nerve system, without medical histories of psychotic disorder or active substance abuse, serving at hospitals, and expressed their intentions to participate in this study. They were composed of 17 men and 13 women, whose average age was 28.2±2.99 years (
Table 1). Written informed consent was obtained from all participants.
Methods
ERP was measured at three separate times by the same researcher, where after the reliability of P300 latency and amplitude was analyzed. Subjects in headphones were given auditory stimuli by use of oddball paradigms, bilaterally. The oddball paradigm is to familiarize subjects with experimental conditions without needing attention, and then to make them pay attention to distinguish target stimuli from non-target stimuli to process ERP. The test was conducted by the use of Medelec Synergy (VIASYS HealthCare, Surrey, UK) electromyography machine. The non-target stimulus (standard stimulus) was given once per second (1 peak particle velocity) in a low note at 1,000 Hz, and the target stimulus was given in a high note at 2,000 Hz. The ratio of non-target stimuli to target stimuli (the oddball ratio) was 4 to 1. The target stimulus was given 60 times to all at a stimulus intensity of 70 dB and a stimulus rate of 1 time/sec. Sweep speed and sensitivities were adjusted at 500 ms/division and 2 µV/division, respectively and data were recorded by averaging technique. The signal was initially filtered at 0.1-50 Hz.
Each subject was allowed to hear both stimuli and was given a full explanation in advance of the test. While undergoing the test, each of them was lying on a medical bed with relaxed mind. In addition, random stimuli were given to subjects lying straight on their back with their eyes closed in order to prevent prediction of when the target stimulus would be given, and they were asked to count the total number of target stimuli to themselves to encourage them to pay attention.
To reduce interference from electrode response, we used electrodes that were made of silver chloride plated with silver (Ag/AgCl), which were placed on the head. Standard electrodes were placed on both mastoid processes (A1 and A2) and the active electrodes and the ground electrode were placed on 'Cz' and the glabella close to the active electrode.
Reliability was evaluated by using intraclass correlation coefficient (ICC). ICC can be used when between two or more different target groups measure the level of internal consistency, so it is suitable for identifying. To calculate the outcomes, we applied modern ICC definition. ICC means ratio of the relative variance of intra-rater with relation to variance of intra-rater and residual on the assumption that it had no interaction between the raters. Suppose that the observation X for the ith object in the jth measurement is x
ij=m+r
i+c
j+rc
ij+e
ij, where i=1,..., n, j=1,..., k, m (the population mean for all observations) is constant, r
i (the object effects) are random, independent, and normally distributed with the mean 0 and variance σ
r2, c
j (the measurement effects) are random, independent, and normally distributed with the mean 0 and variance σ
c2, rc
ij (the interaction effects between objects and measurements) are random, independent, and normally distributed with the mean 0 and variance σ
rc2, and e
ij (residual effects) are random, independent, and normally distributed with the mean 0 and variance σ
e2. Calculating a single score or average score ICCs for consistency in two-way models is given in
and
. Calculating a single score or average score ICCs for absolute agreement in two-way models is given in
and
(
Tables 2,
3). Larger ICC resulted in higher reliability. The data were analyzed by using the program SPSS ver. 18.0 (SPSS, Chicago, IL, USA) and calculated by confirming the ICC within a reliability analysis in the menu.
RESULTS
The mean latency of P300 indicated 300.9±36.8, 315.9±39.8, and 317.1±33.1 ms at the first, second, and third studies, respectively; thus, the average latency was worked out at 311.3±37.0 ms (
Table 4). In a previous study on the latency of P300 in normal Koreans, for reference, the normal values of P300 were 324.10±15.55 and 327.77±15.63 ms in subjects in their 20s and 30s, respectively [
13]. In this study, the average latency in subjects in their mid-20s to mid-30s was shorter than both reference values of 20s and 30s.
The mean amplitude of P300 indicated 5.25±3.22, 4.59±3.24, and 5.02±3.66 mV at the first, second, and third studies, respectively, and thus, the average amplitude was worked out at 4.95±3.35 mV (
Table 4).
The reliability of P300 latency was rated excellent with ICC of 0.81, whereas, the reliability of P300 amplitude was to be fair to good with ICC of 0.53 (
Table 4). The process of statistical hypothesis test for P300 latency ICC value was described in
Tables 3,
5. And P300 amplitude ICC value was described in
Tables 3,
6.
DISCUSSION
The study on the cognitive process and function of humankind is mightily important in the comprehension of high-level mental activities and the sequelae of brain diseases, as well as in the setup of therapeutic plans. The measurement of ERP, generated by specific stimuli in the brain-based mechanisms involved in the cognitive processes, has made such studies more objective and reliable.
In recent times, studies have been made of endogenous potentials, especially with P300 that are evoked not by physical stimuli directly, but by resulting mental activities, such as memory, perception and attention [
14]. In 2003, Hodo [
15] reported that the amplitude of P300 was lower in patients with schizophrenia than in the normal control group, but the latency was longer in the patients.
In this study, it was found that the reliability of P300 latency in the auditory oddball paradigm had a high agreement of the test score with an ICC of 0.81. The reliability optimizes the differences of P300 latency caused by trait variance like diseases, such as schizophrenia and dementia, and is supportive of its use in clinical and experimental studies as a psychophysiological screening tool. P300 amplitude was found to have a fair to good reliability with an ICC of 0.53. Thus, P300 latency is presumed to be more reliable than P300 amplitude in clinical applications.
Regarding the reliability of P300 in the auditory oddball paradigm, in 1993 Segalowitz and Barnes [
10] assessed the test-retest reliability in 19 growing adolescents twice at intervals of 2 years, and the ICCs of latency and amplitude were worked out at 0.76 and 0.61, respectively. On the authority of the results, they checked the temporal stability along with the test score agreement with a view to ascertain whether the stability of P300 maintained in changes during development or degree of variability over time. In this study, on the other hand, the test was conducted on adults in their 20s and 30s three times without time interval, which was focused on the low measurement error and agreement of the test score.
In 2000, Lee et al. [
16] conducted the test on 30 patients with encephalopathy four times and found that P300 latency had a high reliability with the alpha value of 0.977. However, they did not measure P300 amplitude and dealt with patients with encephalopathy. In the methodology; therefore, their study is different from this study conducted on the normal.
In 2006, Hall et al. [
11] assessed the test-retest reliability in 19 monozygotic twins twice at the intervals of 7 to 56 days, and as a result, the ICCs of latency and amplitude were worked out at 0.88 and 0.85, respectively. However, their study was different from this study in sample size, frequency and time interval. In this study, moreover, subject was given 60 target stimuli trial in each session, whereas 80 target stimuli trial was given in each session in their study; in case the target stimuli increase in number, invalid variance is reduced [
17]. In the study of Hall et al. [
11] P300 amplitude and latency were all excellent in reliability, whereas in this study amplitude was inferior to the latency in reliability. Amplitude varies depending upon the arousal state of the subject. The P300 latency component is only 20 ms longer with severe drowsiness; whereas the amplitude changes dramatically for the state components as the subject's arousal level drops [
18]. Thus, it is possible to explain the reason why amplitude is inferior to the latency in reliability in the process of repetitive tests.
In 1986, Polich [
12], who assessed the test-retest reliability twice in 100 college students (mean age, 20.4 years), reported that P300 amplitude had a high reliability with a Pearson correlation of 0.71. However, due to data presented as Pearson correlation, it represented intersubject stability rather than score agreement. However, this study was focused on score agreement as the data were presented in ICC.
As aforesaid, this study was conducted with more repetitive tests compared to the previous studies in which test-retest reliability was assessed in the normal, and moreover, was focused on the agreement of the test score. In result, latency was excellent in the reliability. Amplitude was inferior to the latency in reliability, but had a fair to good reliability. As such, this study is significant. In other words, measurement error in the experimental condition, normal variation in P300 latency and amplitude that is due to subject state, such as fatigue and arousal level, are not sufficient to exceed the normal range of intra-rater reliability in the normal adults group.
In 1997, Kim et al. [
13] made a comparative analysis according to age in normal adult subjects in their 10s to 60s. According to their report, the latency of ERP was shortest in subjects aged 19, but tended to be longer in proportion to age. The amplitude did not show significant changes in the subjects who are in their 10s, 20s, and 30s, but tended to be decreased rapidly in subjects aged 50 and older. To minimize interference caused by age variance on P300 latency [
19], we recruited healthy young subjects in their mid-20s to mid-30s. In this study, average latency was 311.3±37.0 ms. In comparison with the reference value presented by Kim et al. [
13] (324.10±15.55 and 327.77±15.63 ms in adults in their 20s and 30s), the average latency of subjects in their mid-20s to mid-30s in this study was shorter and the standard deviation was wider. The reason may be explained by several differences, e.g., sample size, frequency, filter setting, and stimulus variance.
There were some limitations of this study. It is not certain that the same reliability can be derived from other studies of similar experimental condition and stimulus design. The reason is that not all the factors, controlling the variability of ERP components, can be identified. Accordingly, there is the need to conduct a further study on factors affecting the reliability, including laboratorial temperature and the difficulty level of task.