Background: Predicting treated language improvement (TLI) and transfer to the untreated language (cross-language generalization, CLG) after speech-language therapy in bilingual individuals with post-stroke aphasia is crucial for personalized treatment planning. This study evaluated machine learning (ML) models to predict TLI and CLG, and identified the key predictive features (e.g., patient severity, demographics, and treatment variables) aligning with clinical evidence.
Methods: 48 Spanish-English bilingual individuals with post-stroke aphasia received 20 sessions of semantic feature-based naming treatment in either their first or second language. Comprehensive language, cognitive, and background bilingual experience assessments were administered pre- and post-treatment. Sixteen curated features spanning demographics, language abilities, cognition, and bilingual experience were used as inputs to six ML algorithms to predict treatment responders vs. non-responders and CLG vs no CLG.
Results: The top two ML models achieved F1 scores of 0.767 ± 0.153 for TLI and 0.790 ± 0.172 for CLG. Interpretability analyses revealed that aphasia severity in the trained language, education, and cognitive performance were key predictors of TLI. Aphasia severity in the untreated language and cognitive performance emerged as influential features of CLG. These aligned with expectations based on prior literature.
Conclusions: For the first time, ML models reveal that factors such as patient severity and demographics predict TLI and CLG after therapy in Spanish-English bilingual individuals with post-stroke aphasia. Consideration of both treated and untreated language severity, as well as cognitive assessment performance, when forecasting treatment outcomes in an underserved population such Spanish-English stroke survivors, can meaningfully impact their short-term and long-term clinical care.