This article investigates the effectiveness of three ensemble learning techniques, stacking, bagging, and boosting, in predicting credit scores and loan defaults using three distinct datasets. We compare the models based on accuracy, precision, recall, and F1 score to assess their performance in both binary and multi-class classification tasks. Among the models, stacking achieved the highest overall performance, with a multi-class credit scoring accuracy of 82%, compared to 72% for binary classification using bagging. However, bagging performed less effectively in predicting loan defaults. Boosting, while generally less effective in handling imbalanced data and complex multi-class problems, still produced acceptable results in certain scenarios. The findings suggest that stacking and bagging are particularly well-suited for credit scoring and loan default prediction, making them valuable tools for financial institutions. The study also highlights the importance of addressing class imbalance and applying feature engineering to enhance model performance. Future research should focus on improving model explainability and developing advanced techniques to handle data complexity.

