Machine learning-based screening of MCF-7 human breast cancer cells and molecular docking analysis of essential oils from Ocimum basilicum against breast cancer

Tan Khanh Nguyen, Thao Nguyen Le Nguyen, Kiet Nguyen, Huynh Van Thi Nguyen, Linh Thuy Thi Tran, Thanh Xuan Thi Ngo, Phu Tran Vinh Pham, Manh Hung Tran

ABSTRACT: A machine learning algorithm-based model is a powerful tool for discovering candidate compounds on the breast cancer cell line MCF-7. Using the Lazy Predict Python package, the “Random Forest” algorithm indicates that the highest accuracy is 83.57% in discovering potential compounds. Essential oils from the leaves and stems of Ocimum basilicum, as identified by GC−MS analysis, are selected as a test dataset. Among them, eight essential oils, including alpha-pinene, trans-beta-ocimene, estragole, alpha-cubebene, gamma-muurolene, delta-cadinol, gamma-cadinene, and beta-ocimene potentially exhibit activity against MCF-7. The anticancer mechanisms of these essential oils are analyzed using molecular docking simulation based on the structure-activity relationship between these candidates and the two protein targets, BRCA1 and BRCA2. This study shows that our model can potentially screen bioactive compounds targeting breast cancer cell line MCF-7 and offer the basis for further research into substances derived from Ocimum basilicum that can potentially be utilized as a novel treatment for breast cancer.