Prediction of main components in clove essential oil using optimized machine learning models


Uzun Y., SALTAN F. Z.

Journal of Essential Oil-Bearing Plants, 2026 (SCI-Expanded, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1080/0972060x.2026.2614372
  • Dergi Adı: Journal of Essential Oil-Bearing Plants
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Anahtar Kelimeler: Artificial intelligence, Essential oil, Eugenia caryophyllata, Machine learning, Pharmacognosy
  • Anadolu Üniversitesi Adresli: Evet

Özet

Gas chromatography-mass spectrometry (GC-MS) is a crucial method for analyzing essential oils, but it remains time-consuming and resource-intensive. This study proposes a machine learning (ML)-based framework as a rapid, pre-screening tool to predict the dominant chemical component of clove (Eugenia caryophyllata L.) essential oil using readily available metadata. The goal is not to replace GC-MS but to complement it by enabling faster preliminary assessments and guiding targeted analyses. This study employs five machine learning models, Random Forest, SVM, XGBoost, KNN, and Decision Tree, optimized via hyperparameter tuning to predict the main components of E. caryophyllata essential oil. Performance metrics, including accuracy, R2, MSE, RMSE, and MAE, were evaluated to compare the effectiveness of the models. The results indicate that the XGBoost model, evaluated via rigorous 10-fold cross-validation, achieved superior performance with a test accuracy of 0.9565 and an R2 score of 0.9718, significantly outperforming other models (Random Forest, SVM, KNN, and Decision Tree). In contrast, the KNN model exhibited the lowest performance with an accuracy of 0.5652. The study demonstrates that XGBoost, with its advanced ensemble learning and hyperparameter optimization capabilities, is the most suitable model for predicting the primary components in clove essential oils. Future research could explore deep learning approaches that use larger datasets to improve prediction accuracy.