PL EN
Prediction of Pharmaceutical Residue Presence in Aquatic Systems Using Graph-Based Deep Learning Models
 
More details
Hide details
1
Department of Computer Science, Faculty of Science and Technology, Sultan Moulay Slimane University, Campus Mghilla, Beni Mellal, 23000, Morocco
 
2
Ethnopharmacology and Pharmacognosy, Faculty of Sciences and Techniques Errachidia, Moulay Ismail University of Meknes, BP 509, Boutalamine, Errachidia 52000, Morocco.
 
3
Department of Computer Science, Faculty of Polydisciplinary,Sultan, Sultan Moulay Slimane University, Beni Mellal, 23000, Morocco.
 
4
Department of Computer Science, Faculty of Science, Chouaib Doukkali University, El Jadida, Morocco.
 
 
Corresponding author
AYOUB BELAIDI   

Department of Computer Science, Faculty of Science and Technology, Sultan Moulay Slimane University, Campus Mghilla, Beni Mellal, 23000, Morocco
 
 
 
KEYWORDS
TOPICS
ABSTRACT
Pharmaceutical residues discharged into aquatic systems constituted an emerging environmental threat and posed considerable challenges to conventional monitoring strategies. Analytical methods such as LC-MS/MS, although precise, remained costly, time-consuming, and unsuitable for large-scale continuous monitoring. The objective of this study was to develop a classification model based on deep learning to predict the presence or absence of pharmaceutical residues in water samples, using both molecular characteristics and environmental parameters. A dataset collected from various aquatic environments (rivers, wastewater treatment plant effluents, groundwater) was filtered, annotated, and transformed into a binary classification set where the target value corresponded to the detection (1) or non-detection (0) of the pharmaceutical product. The molecular structures were converted into atomic graphs using RDKit, allowing the use of three advanced models: Graph Neural Network (GNN), Graph Attention Network (GAT), and Message Passing Neural Network (MPNN). Contextual information (matrix, therapeutic group, analyte type, location, and sampling period) was integrated in addition to the molecular representations. Graph-based models have produced solid performances. The MPNN achieved the best scores with an accuracy of 92.8%, an F1-score of 0.92, and an AUC of 0.96. The GAT achieved 90.3% accuracy, 0.90 F1-score, and 0.94 AUC, while the GNN obtained 84.2%, 0.89, and 0.84 respectively. The integration of molecular features and environmental metadata improved performance by more than 12% compared to models using only molecular representations. The performance remained influenced by class imbalance, regional variability, and the incomplete nature of certain environmental variables. This approach has not replaced instrumental analyzes, but has constituted a promising complementary tool. It has helped reduce the exclusive reliance on analytical measurements and more effectively guide water monitoring. To our knowledge, this is one of the first studies simultaneously integrating molecular graphs and environmental metadata for the binary prediction of pharmaceutical contamination in natural waters.
Journals System - logo
Scroll to top