Extensive Multilabel Classification of Brain MRI Scans for Infarcts Using the Swin UNETR Architecture in Deep Learning Applications

Article information

Ann Rehabil Med. 2024;48(4):271-280
Publication date (electronic) : 2024 August 22
doi : https://doi.org/10.5535/arm.230029
1Department of Physical Medicine and Rehabilitation, Seoul Daehyo Rehabilitation Hospital, Yangju, Korea
2Department of Emergency Medicine, Pohang SeMyeong Christianity Hospital, Pohang, Korea
Correspondence: Jaeho Oh Department of Physical Medicine and Rehabilitation, Seoul Daehyo Rehabilitation Hospital, 36 Goeumnam-ro, Yangju 11492, Korea. Tel: +82-31-894-1000 Fax: +82-31-894-1077 E-mail: bibliother@naver.com
Received 2023 December 7; Revised 2024 June 21; Accepted 2024 July 18.

Abstract

Objective

To distinguish infarct location and type with the utmost precision using the advantages of the Swin UNEt TRansformers (Swin UNETR) architecture.

Methods

The research employed a two-phase training approach. In the first phase, the Swin UNETR model was trained using the Ischemic Stroke Lesion Segmentation Challenge (ISLES) 2022 dataset, which included cases of acute and subacute infarcts. The second phase involved training with data from 309 patients. The 110 categories result from classifying infarcts based on 22 specific brain regions. Each region is divided into right and left sides, and each side includes four types of infarcts (acute, acute lacunar, subacute, subacute lacunar). The unique architecture of Swin UNETR, integrating elements of both the transformer and u-net designs with a hierarchical transformer computed with shifted windows, played a crucial role in the study.

Results

During Swin UNETR training with the ISLES 2022 dataset, batch loss decreased to 0.8885±0.1897, with training and validation dice scores reaching 0.4224±0.0710 and 0.4827±0.0607, respectively. The optimal model weight had a validation dice score of 0.5747. In the patient data model, batch loss decreased to 0.0565±0.0427, with final training and validation accuracies of 0.9842±0.0005 and 0.9837±0.0010.

Conclusion

The results of this study surpass the accuracy of similar studies, but they involve the issue of overfitting, highlighting the need for future efforts to improve generalizability. Such detailed classifications could significantly aid physicians in diagnosing infarcts in clinical settings.

GRAPHICAL ABSTRACT

INTRODUCTION

Evolution of deep learning architectures-from perceptrons to Swin UNEt TRansformers

Artificial neurons are designed to mimic the way biological neurons fire signals when they receive a sufficient number of inputs from other neurons [1]. In 1943, McCulloch and Pitts [1] introduced an early artificial neuron model, known as threshold logic units (TLUs). This model compares a weighted sum of input signals to a threshold to determine the neuron’s output, marking the beginning of artificial neural networks [1]. In 1957, Rosenblatt [2] invented the perceptron using a modified TLU that applies a step function to the weighted sum of the inputs. To address the limitations of perceptron highlighted by Minsky and Papert [3], particularly their inability to solve exclusive classification problems, a multilayer perceptron (MLP) were developed by stacking multiple perceptrons [4]. Rumelhart et al. [4] developed the groundbreaking backpropagation algorithm, and convolutional neural networks (CNNs) were developed, inspired by the consecutive layered structure of neurons, the optic nerve’s local receptive fields [5]. For years, deep CNNs have led in various visual recognition tasks [6]. However, they have limitations in learning long-range dependencies, crucial for segmenting lesions of different shapes and sizes [7] and for excellent output results, supervised training of a large network and millions of parameters are required. But in the medical field, it was difficult to obtain even thousands of training images [8]. Fully convolutional network, also called u-net, was proposed to compensate for the shortcomings of difficulties. It features a contracting path for context capture and a symmetric expanding path for precise localization, maximizing the use of annotated samples through data augmentation [8].

In 2017, Google researchers introduced the transformer architecture, revolutionizing neural machine translation using attention mechanisms instead of convolutional layers [9]. Building on this, Hatamizadeh et al. [5] reformulated volumetric medical image segmentation as a sequence-to-sequence prediction problem, introducing UNEt TRansformers (UNETR), which combines a transformer encoder with a U-shaped network. This structure effectively captures multi-scale patterns and delivers precise semantic segmentation. To overcome the challenge arising from the difference between language and vision, such as large variations in the scale of visual entities and the high resolution of pixels in images, Liu et al. [10] proposed a hierarchical transformer using shifted windows. Inspired by the success of vision transformers, Hatamizadeh et al. [7] propose a novel segmentation model termed Swin UNEt TRansformers (Swin UNETR). Swin UNETR’s hierarchical transformer efficiently processes high-resolution images, capturing the fine details necessary for accurate lesion classification in brain magnetic resonance imaging (MRI). Its U-shaped design with skip connections enables exact lesion localization, while the integration of attention mechanisms significantly enhances feature focus, crucial for identifying different lesion types.

The necessity of this study

In the acute phase of stroke care, determining the location of the infarct is essential for clinical decisions, such as patient triage, stroke mechanism investigation, and additional therapies [11]. While lesion–symptom mapping studies have advanced our understanding of brain-behavior relationships, modern imaging techniques have now surpassed the capabilities of traditional lesion analysis [12]. Voxel-based lesion-symptom mapping (VLSM) techniques involve comparing neurobehavioral scores across patients by analyzing the presence or absence of lesions on a voxel-by-voxel basis [12]. VLSM methods have proven useful in elucidating the impact of infarct location on motor [13], language [14], and cognitive recovery [15]. For example, location of the infarct was the most significant factor influencing positive cognitive results [15]. Given the significance of the infarct location, this study aimed to distinguish infarct location and type with the utmost precision using the advantages of the Swin UNETR architecture. Therefore, based on radiological interpretations, the datasets were divided into 110 categories, and a multilabel was assigned to allow for multiple categories to be applicable to a single case. Due to the limited resources available as an individual researcher, this study was planned within the scope feasible through Google Colab [16].

METHODS

This study was conducted in two phases using the Ischemic Stroke Lesion Segmentation Challenge (ISLES) 2022 open dataset and brain MRI data from 309 patients. First, we trained the Swin UNETR model on the ISLES 2022 dataset to create segmentation masks, focusing on differentiating infarctions from normal parenchyma. The model was trained using data from actual patients with official radiological interpretations to learn how to classify the regions already identified as infarcts.

Training Swin UNETR with the ISLES 2022 dataset

ISLES 2022 dataset

ISLES is a specialized competition to promote and advance the methods for automated segmentation of ischemic stroke lesions, with a slightly different purpose since 2015. The 2022 challenge concentrated on the segmentation of infarcts in multimodal MRI scans, encompassing both acute strokes (0 to 7 days post-onset) and sub-acute strokes (1 to 3 weeks post-onset) [17]. It aimed to identify not only large infarct lesions but also multiple embolic infarcts across 400 cases [17]. Participants included were 18 years or older and had undergone brain MR imaging for diagnosed or suspected stroke and the imaging included at least fluid-attenuated inversion recovery (FLAIR) and diffusion-weighted imaging (DWI) sequences [17]. Image acquisition was performed on one of the following devices: Philips Achieva 3T MRI scanner (Philips Healthcare), Philips Ingenia 3T MRI scanner (Philips Healthcare), Siemens Verio 3T MRI scanner (Siemens Healthineers), Siemens MAGNETOM Avanto 1.5T MRI scanner (Siemens Healthineers), or Siemens MAGNETOM Aera 1.5T MRI scanner (Siemens Healthineers) [17]. The ISLES 2022 datasets were divided into training, validation, and inference datasets in an 8:1:1 ratio in this study. Data augmentation techniques used included random image rotation by 90 degrees, flipping randomly across the axial, sagittal, and coronal planes, and random adjustments to scale and intensity. Permission to use this data was granted by the ISLES 2022 organizer, Ezequiel de la Rosa.

Swin UNETR implementations

Swin UNETR is implemented using PyTorch-Ignite [18] and MONAI [19], trained on Google Colab [16] with A100 graphics processing units. The model was created by using a feature size of 48, which is compatible with the self-supervised pre-trained weights [20] and the Swin UNETR encoder was initialized from pre-trained weights [21]. Training involved the adaptive moment estimation with decoupled weight decay optimizer [22] with an initial learning rate of 1e-5 and a weight decay of 1e-1, using dice focal loss for about 24,000 iterations. We set the maximum epochs to 400 and configured the patience to 40, the number of events to wait if there is no improvement before stopping the training [23]. This approach ensures the training halts when further improvements are unlikely, helping to prevent overfitting and save computational resources. A detailed description of the Swin UNETR architecture is provided in Supplementary Table S1.

Loss function

The training loss function used was dice focal loss, which computes both dice loss (DL) and focal loss, and return the weighted sum of these two losses, calculated voxel-wise [24]. Focal loss is an extension of binary cross entropy (BCE) loss between the target and the input probabilities [24]. The dice coefficient (DICE) for the binary classification can be written as:

(1)DICEt=2n=1N(pitgit+ε)n=1N(pit)+n=1N(git+ε)

The variable git, which can be either 0 or 1, denotes the ground truth label of class t for pixel i, with N representing the total number of pixels in the image [25]. The variable pit, represents the output probability, indicating that it falls within the range of 0 to 1 and the term ε is employed to prevent the numerical issue of division by zero [25]. The DL formula is as follows, where w represents the weight ωt corresponding to each class t [25].

(2)DL=tωt1-DICEt

The focal dice loss (FDL) is defined by applying a factor of 1/β as the exponent to DICEt for each class, where the exponent parameter β is greater than or equal to 1 [25].

(3)FDL=tωt1-DICEt1/β

Training loss was recorded every 100 iterations, and at the end of every epoch.

Evaluation metrics

Mean dice was used as the evaluation metric, calculating the dice score from full-size tensors and averaging over batches, class-channels, and iterations [26]. Train mean dice score and validation mean dice score were calculated using the validation and inference datasets.

Training a classification model with patients’ brain MRI data

Patients data

We included subjects aged 18 years or older who received MR imaging for diagnosed stroke at Pohang SeMyeong Christianity Hospital. Images were acquired using Siemens MAGNETOM Vida 3T MRI scanner (Siemens Healthineers) and included DWI, apparent diffusion coefficient, and FLAIR modalities in all cases. The ISLES 2022 dataset focuses exclusively on segmentation for acute and subacute infarcts, thereby excluding chronic and old infarcts. Cases presenting only with intracranial hemorrhage, subarachnoid hemorrhage, intraventricular hemorrhage, or other lesions, without infarct lesions, were also omitted. This study aimed to classify as meticulously as possible based on the interpretations of the Department of Radiology. For instance, although the regions indicated by the basal ganglia, caudate, and lenticulostriate arteries territory overlapped, they were distinguished in radiological interpretations, leading to their establishment as one of total 110 independent categories (Table 1).

Classification and frequency of infarcts

In the categories, borderzone infarct and lacunar infarct followed the definitions set by the Department of Radiology. Borderzone infarct is characterized as ischemic lesions typically found at the intersection of two major arterial territories [27] and lacunar infarct are defined as small brain lesions (0.2 to 15 mm3) [28].

The datasets from 309 patients were randomly divided into training, validation, and inference datasets in an 8:1:1 ratio, with data augmentation techniques similar to those used for the ISLES 2022 dataset. As the objective of this study is multilabel classification, all categories relevant to a single case were assigned, and subsequently, these were transformed into one-hot encoded vectors for further processing.

In this study, we confirm that the use of this clinical data was in full compliance with the ethical standards of the Pohang SeMyeong Christianity Hospital’s Institutional Review Board (IRB no. PSMCHIRB-2024-16-1), ensuring adherence to necessary protocols for patient privacy and data security.

A classification model implementations

A classification model employs a Swin UNETR and the weights demonstrating the best checkpoint mean dice score of 0.5757 were loaded and utilized. The model inputs are tailored for image regions of interest and is designed to classify images into a predefined number of classes. Post-transformer processing includes an adaptive average pooling and layer normalization sequence, ensuring a consistent and stabilized feature representation before classification. The final classification is achieved through a fully connected linear layer, mapping the extracted and normalized features to the respective class probabilities, signifying the model’s capacity for precise image-based classification. The maximal epochs were set to 400, and the patience was configured to 50.

Loss function

Our chosen loss function is BCE loss between the target and the input probabilities [24], calculated voxel-wise.

(4)BCE=-Mm=1M[ym×log(hθ(xm))+(1-ym)×log(1-hθ(xm))]

Where M: the number of training samples, ym: ground truth label for training sample m, xm: input for training sample m, hθ: hypothesis(model) with weights θ [29]. Training loss was recorded every 5 iterations, and train and validation accuracies were calculated using the respective datasets at the end of each epoch.

Evaluation metrics

We calculated the true positive (TP), true negative (TN), false positive (FP), and false negative (FN) values by comparing the predictions of the classification model with the ground truth labels. Accuracy for our multilabel data was derived as [30]:

(5)Accuracy=TP+TNTP+TN+FP+FN

RESULTS

Outcomes of training Swin UNETR using the ISLES 2022 dataset

During training of Swin UNETR with the ISLES 2022 dataset, the batch loss showed considerable fluctuation over 23,200 iterations but exhibited a decreasing trend as epochs progressed, reaching a value of 0.8885±0.1897 (Table 2, Fig. 1A). The training mean dice score was 0.4224±0.0710, indicating a general upward trend (Table 2, Fig. 1B). The validation mean dice score reached 0.4827±0.0607 (Table 2, Fig. 1C), increasing with the progression of epochs but then declining after 100 epochs. Based on these results, we selected the model weights with a validation mean dice score of 0.5747 as the best for further training and validation on actual patient brain MRI images. Training and validation were halted by an early stopping function set with a patience of 50 epochs.

Statistical summary of training and validation metrics for Swin UNETR using ISLES dataset

Fig. 1.

Performance evaluation of Swin UNETR using ISLES 2022 data. (A) Training batch loss over 100 iterations. The x-axis indicates the number of iterations during the training process and the y-axis denotes the batch loss, reflecting the discrepancy between the model’s predictions and the actual data for each iteration on the training dataset. The graph initially shows considerable fluctuations but trends downward as training advances, reaching a value of 0.8885±0.1897. (B) Training mean dice score over epochs. The x-axis signifies the number of epochs during the validation process and the y-axis represents the mean dice score, which is computed on the validation dataset. The mean dice score, indicating the degree of similarity between predicted and actual segmentations, generally trends upward, achieving a value of 0.4224±0.0710. (C) Validation mean dice score over epochs. The x-axis denotes the number of epochs in the validation process, while the y-axis indicates the mean dice score on the inference dataset. The mean dice score initially rises with the progression of epochs, peaking before declining after around 100 epochs. The highest recorded validation mean dice score was 0.5747, which was chosen for subsequent training and validation on patient brain MRI images. The overall mean dice score was 0.4827±0.0607. Swin UNETR, Swin UNEt TRansformers; ISLES, Ischemic Stroke Lesion Segmentation Challenge; MRI, magnetic resonance imaging.

Outcomes of training a classification model using the patients’ dataset

For the classification model trained with patient data, batch loss significantly decreased from the start and then plateaued after 1,000 iterations, reaching 0.0565±0.0427 (Table 3, Fig. 2A). Training accuracy remained high, close to the maximum value for up to 25 epochs, then slightly decreased, showing a final value of 0.9842±0.0005 (Table 3, Fig. 2B). Validation accuracy similarly stayed high up to 21 epochs before decreasing, with a final value of 0.9837±0.0010 (Table 3, Fig. 2C). The training and validation phases were also stopped using an early stopping function with a patience of 50 epochs.

Statistical summary of training and validation metrics for the classification model using patients dataset

Fig. 2.

Performance evaluation of the classification model using patients data. (A) Training batch loss over 100 iterations. The x-axis indicates the number of iterations during the training process, and the y-axis represents the batch loss, which measures the discrepancy between the model’s predictions and the actual data for each iteration on the training dataset. The graph demonstrates a substantial initial reduction in batch loss, which then stabilizes after approximately 1,000 iterations, reaching a final value of 0.0565±0.0427. (B) Training accuracy over epochs. The x-axis represents the number of epochs during the validation process and the y-axis indicates the accuracy on the validation dataset. The accuracy remains high throughout the process, approaching its maximum value within the first 25 epochs and slightly decreasing thereafter, concluding with a final value of 0.9842±0.0005. (C) Validation accuracy over epochs. The x-axis represents the number of epochs during the validation process, while the y-axis denotes the accuracy on the inference dataset. The validation accuracy remains high up to 21 epochs before showing a slight decline, ending with a final value of 0.9837±0.0010.

DISCUSSION

In summary, the Swin UNETR-based classification model, which was trained on brain MRI scans for infarct segmentation and categorized into 110 distinct classes, achieved an accuracy of 0.9837±0.0010. This significantly surpasses the accuracies reported in similar studies. For instance, Subudhi et al. [31] achieved an accuracy of 90.23% in classifying three types of stroke according to The Oxfordshire Community Stroke Project. Additionally, Cetinoglu et al. [32] reported a 93% accuracy in classifying three vascular territories (anterior cerebral artery, middle cerebral artery, watershed).

The superior performance of Swin UNETR over MobileNetV2 and EfficientNet-B0 CNN models, used in the two studies above, can be attributed to two main factors. Firstly, architectural differences are critical. Cetinoglu et al. [32] used modified versions of MobileNetV2 and EfficientNet-B0, these fully convolutional neural networks have limitations in modeling long-range information due to their convolution layers’ limited kernel size. In contrast, Swin UNETR outperformed previous winning methodologies like 3D segmentation network with residual connections [33], no-new-Net [34], and Multimodal Brain Tumor Segmentation Using Transformer [35] in the research by Hatamizadeh et al. [7]. Secondly, Swin UNETR’s pretraining on a specialized dataset of computed tomography (CT) scans likely contributed to its enhanced accuracy. The pretrained Swin UNETR encoder used weights from a cohort of 5,050 CT scans [20], whereas MobileNetV2 and EfficientNet-B0 were pretrained with the ImageNet dataset [36], a general visual database not specific to medical images. This difference in pretraining context and specificity is believed to have played a role in the improved accuracy.

However, our study has limitations. The decrease in validation accuracy after 20 epochs, as observed in Fig. 2, suggests overfitting. This issue is somewhat inevitable given the large number of classification categories (110) relative to the small number of patient MRI data (N=309). In such cases, while different categories were delineated based on radiological interpretations, merging classes that share overlapping areas could be attempted to reduce the total number of classes, thereby potentially overcoming overfitting. However, as previously stated in our research objectives, as a physiatrist dealing with brain imaging and patient symptoms, our primary goal was to distinguish infarct location and type with the utmost precision. And due to time and resource constraints, we were unable to perform hyperparameter tuning, just as we could not engage in the repetitive tasks of adjusting and experimenting with the number of categories. Future work, including acquiring more MRI data and employing regularization, dropout, and other techniques, is expected to improve generalizability.

Additionally, training exclusively on the ISLES 2022 dataset, which focuses on acute and subacute infarcts, did not include chronic or old infarcts. With the acquisition of more patient data in the future, it would be possible to conduct studies on chronic and old lesions.

Currently, there are several commercially available AI-based software solutions for stroke, demonstrating diagnostic accuracies for brain infarcts with MRI and other imaging techniques, with sensitivities and specificities of 44% to 83% and 57% to 93% for tissue ischemia, 80% to 96% and 90% to 98% for large vessel occlusion, and 96% and 95% for hemorrhage detection [37]. For instance, in detecting and quantifying ischemic core and penumbra, the e-ASPECTS software can outperform non-stroke experts and is at least as effective as stroke experts in applying the Alberta Stroke Programme Early CT Score (ASPECTS) for patients with acute ischemic stroke [38]. This paper is based on open datasets and brain MRI scans from 400 actual patients who visited the emergency room at Pohang SeMyeong Christianity Hospital. Although this is a relatively small dataset for deep learning applications, we achieved excellent results through augmentation techniques. Rehabilitation medicine, which prioritizes both brain MRI images and patients’ symptoms, is ideally suited for such research, but clinical physicians’ participation in studies is low. This paper will spark future interest and research among clinicians by presenting the data collection methods and sample sizes to those unfamiliar with deep learning and uncertain about how to begin research in this area. Embracing deep learning in rehabilitation medicine can significantly transform patient care. It enhances diagnostic accuracy, personalizes treatment plans, improves efficiency, and ultimately leads to better patient outcomes.

Conclusion

The classification model based on Swin UNETR architecture achieved high accuracy for 110 classes of acute and subacute brain infarcts through a two-stage learning process. This study highlights the potential for achieving detailed classification using open datasets as a supplement to limited patient data. This method is especially useful in scenarios where segmentation is constrained by resources. Such detailed classifications could significantly aid physicians in diagnosing infarcts in clinical settings.

Notes

CONFLICTS OF INTEREST

No potential conflict of interest relevant to this article was reported.

FUNDING INFORMATION

None.

AUTHOR CONTRIBUTION

Conceptualization: Oh J. Methodology: Oh J, An H. Formal analysis: Oh J. Project administration: Oh J, An H. Visualization: Oh J. Writing – original draft: Oh J. Writing – review and editing: Oh J, An H. Approval of final manuscript: all authors.

SUPPLEMENTARY MATERIALS

Supplementary materials can be found via https://doi.org/10.5535/arm.230029.

Supplementary Table S1.

Detailed Layer Architecture of the SwinUNETR Model

arm-230029-Supplementary-Table-1.pdf

References

1. McCulloch WS, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bull Math Biol 1943;5:115–33.
2. Rosenblatt F. The perceptron: a probabilistic model for information storage and organization (1958). In : Lewis HR, ed. Ideas that created the future: classic papers of computer science The MIT Press; 2021. p. 183.
3. Minsky M, Papert S. (1969) Marvin Minsky and Seymour Papert, Perceptrons, Cambridge, MA: MIT Press, Introduction, pp. 1-20, and p. 73 (figure 5.1). In : Anderson JA, Rosenfeld E, eds. Neurocomputing, volume 1: foundations of research The MIT Press; 1988. p. 675.
4. Rumelhart DE, Hinton GE, Williams RJ. Learning internal representations by error propagation. In : Rumelhart DE, McClelland JL, ; PDP Research Group, eds. Parallel distributed processing, volume 1: explorations in the microstructure of cognition: foundations The MIT Press; 1986. p. 676–9.
5. Hatamizadeh A, Tang Y, Nath V, Yang D, Myronenko A, Landman B, et al. UNETR: transformers for 3D medical image segmentation. Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV); 2022 Jan 3-8; Waikoloa (HI), USA. New York City (NY): IEEE; 2022.
6. LeCun Y, Kavukcuoglu K, Farabet C. Convolutional networks and applications in vision. Proceedings of the 2010 IEEE International Symposium on Circuits and Systems; 2010 May 30-Jun 2; Paris, France. New York City (NY): IEEE; 2010.
7. Hatamizadeh A, Nath V, Tang Y, Yang D, Roth HR, Xu D. Swin UNETR: Swin transformers for semantic segmentation of brain tumors in MRI images. Proceedings of the 7th International MICCAI Brainlesion Workshop, BrainLes 2021; 2021 Sep 27; Virtual Event. Cham: Springer; 2022.
8. Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2015; 2015 Oct 5-9; Munich, Germany. Cham: Springer; 2015.
9. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017); 2017 Dec 4-9; Long Beach (CA), USA. Red Hook (NY): Curran Associates, Inc.; 2017.
10. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, et al. Swin transformer: hierarchical vision transformer using shifted windows. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV); 2021 Oct 10-17; Montreal (QC), Canada. New York City (NY): IEEE; 2022.
11. Etherton MR, Rost NS, Wu O. Infarct topography and functional outcomes. J Cereb Blood Flow Metab 2018;38:1517–32.
12. Bates E, Wilson SM, Saygin AP, Dick F, Sereno MI, Knight RT, et al. Voxel-based lesion-symptom mapping. Nat Neurosci 2003;6:448–50.
13. Lo R, Gitelman D, Levy R, Hulvershorn J, Parrish T. Identification of critical areas for motor function recovery in chronic stroke subjects using voxel-based lesion symptom mapping. Neuroimage 2010;49:9–18.
14. Goldenberg G, Spatt J. Influence of size and site of cerebral lesions on spontaneous recovery of aphasia and on success of language therapy. Brain Lang 1994;47:684–98.
15. Munsch F, Sagnier S, Asselineau J, Bigourdan A, Guttmann CR, Debruxelles S, et al. Stroke location is an independent predictor of cognitive outcome. Stroke 2016;47:66–73.
16. Google Colab [Internet]. Google [cited 2023 Dec 7]. Available from: https://colab.google.
17. Hernandez Petzsche MR, de la Rosa E, Hanning U, Wiest R, Valenzuela W, Reyes M, et al. ISLES 2022: a multi-center magnetic resonance imaging stroke lesion segmentation dataset. Sci Data 2022;9:762.
18. PyTorch-Ignite [Internet]. PyTorch-Ignite Contributors [cited 2023 Dec 7]. Available from: https://pytorch-ignite.ai.
19. MONAI [Internet]. MONAI Consortium [cited 2023 Dec 7]. Available from: https://monai.io.
20. Project-MONAI / tutorials [Internet]. GitHub, Inc.; 2022 [cited 2023 Dec 7]. Available from: https://github.com/Project-MONAI/tutorials/blob/main/3d_segmentation/swin_unetr_btcv_segmentation_3d.ipynb.
21. Weights [Internet]. GitHub, Inc. [cited 2023 Dec 7]. Available from: https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/model_swinvit.pt).
22. Loshchilov I, Hutter F. Decoupled weight decay regularization. arXiv:1711.05101 [Preprint]. 2019 [cited 2023 Dec 7]. Available from: https://doi.org/10.48550/arXiv.1711.05101.
23. EarlyStopping [Internet]. PyTorch-Ignite Contributors [cited 2023 Dec 7]. Available from: https://pytorch.org/ignite/generated/ignite.handlers.early_stopping.EarlyStopping.html.
24. Loss functions [Internet]. MONAI Consortium [cited 2023 Dec 7]. Available from: https://docs.monai.io/en/stable/losses.html.
25. Wang P, Chung ACS. Focal dice loss and image dilation for brain tumor segmentation. Proceedings of the 4th International Workshop on Deep Learning in Medical Image Analysis, DLMIA 2018, and the 8th International Workshop on Multimodal Learning for Clinical Decision Support, ML-CDS 2018; 2018 Sep 20; Granada, Spain. Cham: Springer; 2018.
26. Event handlers. [Internet]. MONAI Consortium [cited 2023 Dec 7]. Available from: https://docs.monai.io/en/stable/handlers.html.
27. Mangla R, Kolar B, Almast J, Ekholm SE. Border zone infarcts: pathophysiologic and imaging characteristics. Radiographics 2011;31:1201–14.
28. Fisher CM. Lacunes: small, deep cerebral infarcts. Neurology 1998;50:841–841-a.
29. Ho Y, Wookey S. The real-world-weight cross-entropy loss function: modeling the costs of mislabeling. IEEE Access 2019;8:4806–13.
30. Accuracy [Internet]. PyTorch-Ignite Contributors [cited 2023 Dec 7]. Available from: https://pytorch.org/ignite/generated/ignite.metrics.Accuracy.html.
31. Subudhi A, Sahoo S, Biswal P, Sabut S. Segmentation and classification of ischemic stroke using optimized features in brain MRI. Biomed Eng Appl Basis Commun 2018;30:1850011.
32. Cetinoglu YK, Koska IO, Uluc ME, Gelal MF. Detection and vascular territorial classification of stroke on diffusion-weighted MRI by deep learning. Eur J Radiol 2021;145:110050.
33. Myronenko A. 3D MRI brain tumor segmentation using autoencoder regularization. Proceedings of the 4th International MICCAI Brainlesion Workshop, BrainLes 2018; 2018 Sep 16; Granada, Spain. Cham: Springer; 2019.
34. Isensee F, Jaeger PF, Kohl SAA, Petersen J, Maier-Hein KH. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 2021;18:203–11.
35. Wang W, Chen C, Ding M, Yu H, Zha S, Li J. TransBTS: multimodal brain tumor segmentation using transformer. Proceedings of the 24th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2021; 2021 Sep 27-Oct 1; Strasbourg, France. Cham: Springer; 2021.
36. ImageNet [Internet]. Stanford Vision Lab, Stanford University, Princeton University [cited 2023 Dec 7]. Available from: https://www.image-net.org/index.php.
37. Wardlaw JM, Mair G, von Kummer R, Williams MC, Li W, Storkey AJ, et al. Accuracy of automated computer-aided diagnosis for stroke imaging: a critical evaluation of current evidence. Stroke 2022;53:2393–403.
38. Mokli Y, Pfaff J, Dos Santos DP, Herweh C, Nagel S. Computer-aided imaging analysis in acute ischemic stroke - background and clinical applications. Neurol Res Pract 2019;1:23.

Article information Continued

Fig. 1.

Performance evaluation of Swin UNETR using ISLES 2022 data. (A) Training batch loss over 100 iterations. The x-axis indicates the number of iterations during the training process and the y-axis denotes the batch loss, reflecting the discrepancy between the model’s predictions and the actual data for each iteration on the training dataset. The graph initially shows considerable fluctuations but trends downward as training advances, reaching a value of 0.8885±0.1897. (B) Training mean dice score over epochs. The x-axis signifies the number of epochs during the validation process and the y-axis represents the mean dice score, which is computed on the validation dataset. The mean dice score, indicating the degree of similarity between predicted and actual segmentations, generally trends upward, achieving a value of 0.4224±0.0710. (C) Validation mean dice score over epochs. The x-axis denotes the number of epochs in the validation process, while the y-axis indicates the mean dice score on the inference dataset. The mean dice score initially rises with the progression of epochs, peaking before declining after around 100 epochs. The highest recorded validation mean dice score was 0.5747, which was chosen for subsequent training and validation on patient brain MRI images. The overall mean dice score was 0.4827±0.0607. Swin UNETR, Swin UNEt TRansformers; ISLES, Ischemic Stroke Lesion Segmentation Challenge; MRI, magnetic resonance imaging.

Fig. 2.

Performance evaluation of the classification model using patients data. (A) Training batch loss over 100 iterations. The x-axis indicates the number of iterations during the training process, and the y-axis represents the batch loss, which measures the discrepancy between the model’s predictions and the actual data for each iteration on the training dataset. The graph demonstrates a substantial initial reduction in batch loss, which then stabilizes after approximately 1,000 iterations, reaching a final value of 0.0565±0.0427. (B) Training accuracy over epochs. The x-axis represents the number of epochs during the validation process and the y-axis indicates the accuracy on the validation dataset. The accuracy remains high throughout the process, approaching its maximum value within the first 25 epochs and slightly decreasing thereafter, concluding with a final value of 0.9842±0.0005. (C) Validation accuracy over epochs. The x-axis represents the number of epochs during the validation process, while the y-axis denotes the accuracy on the inference dataset. The validation accuracy remains high up to 21 epochs before showing a slight decline, ending with a final value of 0.9837±0.0010.

Table 1.

Classification and frequency of infarcts

Location Rt. Lt. Location Rt. Lt. Location Rt. Lt.
ACA 3 3 MCA 32 29 PCA 4 4
(1)
<1> <1>
[1]
Acho. 2 1 SCA 4 3 PICA 3 5
Frontal 6 10 Temporal 4 2 Parietal 9 3
(5) (3) (3) (3) (5) (2)
<1>
Occipital 9 4 Borderzone 6 6 Cerebellum 10 2
(5) (3) (10) (7)
<1> <1> <1> <1> <3>
[1] [1]
Precentral 3 3 Paracentral 1 Postcentral 1
(1) (5) (2) (1) (2)
[1]
BG 7 5 LSA 9 8 Caudate 1 1
(1) (2) (1)
<1> <1>
ICA 2 CR 5 8 PVWM 3
(6) (5)
Parahippocam-pus Hippocampus 1 1 LGB 1 1
(1) (1) (1) (1)
<1> <1>
Pons 10 7 Thalamus 3 6 PLIC 3 1
(1) (2) (3) (1) (1)
<1>
[2] [1]
Medulla 4 Centrum semiovale 3 Corpus callosum 1
(6) (1) (2)
<1>
[1] [1]
Normal 20
Acute embolic infarct 20
<1>
Vermis 4
(1)

The number of acute infarcts is indicated without brackets, the number of acute lacunar infarcts is represented by numbers within parentheses, the number of subacute infarcts is denoted by numbers within angle brackets, and the number of subacute lacunar infarcts is signified by numbers within square brackets.

Rt, right; Lt, left; ACA, anterior cerebral artery territory; MCA, middle cerebral artery territory; PCA, posterior cerebral artery territory; Acho., anterior choroidal artery territory; SCA, superior cerebellar artery; PICA, posterior inferior cerebellar artery territory; Frontal, frontal lobe; Temporal, temporal lobe; Parietal, parietal lobe; Occipital, occipital lobe; Border zone, characteristic locations at the junction between two main arterial territories; Precentral, precentral gyrus; Paracentral, paracentral gyrus; Postcentral, postcentral gyrus; BG, basal ganglia; LSA, lenticulostriatal artery territory; ICA, internal carotid artery territory; CR, corona radiata; PVWM, periventricular white matter; LGB, lateral geniculate body; PLIC, posterior limb of internal capsule.

Table 2.

Statistical summary of training and validation metrics for Swin UNETR using ISLES dataset

Count Minimum Maximum Mean Median SD
Train batch loss 232.0 0.1650 1.0830 0.8885 0.9819 0.1897
Train mean dice 116.0 0.1607 0.5660 0.4224 0.4276 0.0710
Validation mean dice 116.0 0.2335 0.5747 0.4827 0.4911 0.0607

Swin UNETR, Swin UNEt TRansformers; ISLES, Ischemic Stroke Lesion Segmentation Challenge; SD, standard deviation.

Table 3.

Statistical summary of training and validation metrics for the classification model using patients dataset

Count Minimum Maximum Mean Median SD
Train batch loss 2,050.0 0.0083 0.5565 0.0565 0.0456 0.0427
Train accuracy 51.0 0.9815 0.9847 0.9842 0.9844 0.0005
Validation accuracy 51.0 0.9811 0.9844 0.9837 0.9840 0.0010

SD, standard deviation.