The knowledge of compound bioactivity data against drug targets underpins the discovery of new drugs. However, databases are currently sparse; for example, the ChEMBL dataset is just 0.05% compete and the sparsity of data in proprietary pharma databases is similar. We will describe a novel deep learning algorithm to capture correlations within protein activity data, as well as between molecular descriptors and protein activities, to impute the missing activities. Unlike many deep learning methods, this approach is capable of being trained using sparse and variable data, typical of those available in drug discovery. We will present examples illustrating the application of these deep learning networks to impute missing activities in the sparse input data, as well as to make predictions for new compounds based on molecular descriptors alone. The results will be compared with conventional machine learning methods such as random forests and Gaussian processes.