Forcefields should be consistent with all available data Most datasets are heterogeneous, offline, and static
Sensitive to nonbonded parameters Simple ensemble average geometric interpretation $$\rho = \langle \frac{M}{V} \rangle$$ $$\epsilon = 1 + \frac{4\pi}{3} \frac{\langle \mu \cdot \mu \rangle - \langle \mu \rangle \cdot \langle \mu \rangle}{V k_B T}$$
OpenMM 6.3 GAFF / AM1-BCC (Antechamber + OpenEye) Converge each density to 0.0002 g / mL ($\approx$ expt. error)
$$\alpha = 1.53 n_C + 0.17 n_H + 0.57 n_O + 1.05 n_N + 2.99 n_S + \\ 2.48 n_P + 0.22 n_F + 2.16 n_{Cl} + 0.32 $$ $$\epsilon_{corrected} = \epsilon_{MD} + 4 \pi N \frac{\alpha}{\langle V \rangle}$$
Scale up, real-time simulation, web frontend Perform new experiments in automated wetlab Bayesian (MCMC) forcefield / experimental design Polarizable forcefields
Small molecule forcefields need help ThermoML is a NIST-supported, machine-readable, and growing set of physicochemical data We built a semi-automated benchmark of densities and dielectric constants in ThermoML Empirical polarizability model improves comparisons to measured dielectric constants
Julie Behr (MSKCC) Patrick Grinaway (MSKCC) Bas Rustenburg (MSKCC) John Chodera (MSKCC) Kenneth Kroenlein (NIST)