Different chemical space and appropriate validation+good datasets to explore

LukaLisica · July 2, 2025, 2:49pm

What is currently considered to be the best approach when we are training our model in one chemical space, that is supposed to evaluate test data that is in different chemical space? Also, are there any recommended analysis for determining the similarities of chemical spaces between training and test set?

Additionally, I saw that aircheck has many different target protein datasets to offer, but we are unable to determine what datasets are of the best quality and what targets are the most important, preferably something that has complete molecular information as opposed to only fingeprints?

Lastly, our results tended to be too optimistic when we evaluated on training data, because test data is in chemical space that is very different from training. Are there any ways to determine the prediction accuracy more realistic before doing the experimental validation?

Thank you for your time

New member, Luka Lisica

Topic		Replies	Views
Open Mainframe Project - RAG to Riches - Using Your Legacy Data Project Ideas	0	61	April 13, 2025
On datasets naming-scheme for batch processing	2	223	October 16, 2023
About the Compatibility category Compatibility	0	865	November 30, 2016
VS code unable to get the Data Sets in the connection guide Zowe	2	192	January 10, 2025
Any way to explore cobol and cics without access to a mainframe? COBOL technical questions	26	11338	February 15, 2025

Different chemical space and appropriate validation+good datasets to explore

Related topics