Showing posts from January, 2015

How to create your own replica of the SureChEMBL patent-chemistry dataset

Introduction: Why replicate SureChEMBL? SureChEMBL is a patent chemistry dataset and set of web services that provides a rich source of information to the drug discovery research community. It was previously owned, developed, and sold by Macmillan, but was recently handed over to the European Bioinformatics Institute (EMBL/EBI) and is now free for everyone to use.

SureChEMBL can already be accessed online, so why would a locally hosted replica be needed?

To answer that question, I'll give the reasons provided by a pharmaceutical company who recently commissioned me to develop a SureChEMBL data replication facility:
1) Firewall restrictions can be avoided - companies involved in drug discovery are often working with substructures or other related search queries which may lead to highly lucrative discoveries. As such, researchers are often prohibited from using external web services, even secure services such as SureChEMBL, as a risk-mitigation strategy. Downloading data files - e.g…