HepatoPred-EL

Prediction of chemical hepatotoxicity using ensemble learning methods

About HepatoPred-EL Server

What is HepatoPred-EL server?

HepatoPred-EL (hepatotoxicity prediction using ensemble learning methods) is a hepatotoxicity prediction web server, which classifies compounds as hepatotoxicants, and Non-hepatotoxicants, using only their two-dimensional structures. This web server has integrated an ensemble learning model to predict the hepatotoxicity of chemicals.

How the ensemble models integrated in HepatoPred-EL server formed?

The ensemble model was developed via fusing a series of basic models, which were generated by three different machine learning algorithms, including SVM (Support Vector Machine), RF (Random Forest), and XGBoost (Extreme Gradient Boosting) and twelve different molecular fingerprints on a dataset containing 1241 diverse compounds using the PaDEL-Descriptor software. As a result, 35 ensemble models were proposed using these basic classifiers. The best n (n = 36, 35, 34, 33…4, 3, 2) basic classifiers with better predictive performance were fused to form the ensemble model via averaging the probabilities from the basic classifiers, finally we choose the best model Top-5 ensemble model as our classifier (SubFPC and KRC are used by RF algorithm, and CDK, CDKExt and Pubchem are used by SVM algorithm)。

What are the performances for the models integrated in HepatoPred-EL server?

The predictive performance of the basic models and ensemble models were evaluated by 5-fold cross-validation with 100 repeats. The ensemble models have outperformed all the basic models. The performance indicators for the three ensemble models are listed in the following table:

Models Accuracy (%) Sensitivity (%) Specificity (%) AUC (%)
Top-5 Ensemble 71.1±2.6 79.9±3.6 60.3±4.8 76.4±2.6

How to use the HepatoPred-EL server?

Users can draw their chemical structures in the ketcher canvas, or can enter the SMILES strings of their chemical to the textbox. It is also possible to upload a file containing the compounds to be predicted. The format of the file can be SMILES (with extension of .smiles or .smi), sdf, mol, or mol2. Up to 1000 molecules can be processed at one time. Users can select one or more models to make predictions in a single run.

How to interpret the output from HepatoPred-EL server?

The HepatoPred-EL server only accepts compounds that contain more than 3 carbon atoms. For compounds that do not meet this requirement, no prediction will be made. And these compounds will be listed on the “Failure predictions” section in the output page. The ensemble models classify compounds as hepatotoxicants and Non-hepatotoxicants and the results of probability values and classification labels are listed in the “Average” and “Class” column of the output table. The probability values from each basic model are also provided. The probability values ranges from 0 to 1. If the probability is greater than 0.5, the compound is considered to be a hepatotoxicant. Otherwise, it is considered to be a non-hepatotoxicant substance.