\[ \definecolor{data}{RGB}{18,110,213} \definecolor{unknown}{RGB}{217,86,16} \definecolor{learned}{RGB}{175,114,176} \]

Benchmarking Small Molecule Forcefields at the Scale of NIST

Kyle A. Beauchamp, Choderalab@MSKCC
Slides here: goo.gl/rKGhzZ

Forcefields: Are we there yet?

Water and protein: getting better

Lindorff-Larsen, PLOS One, 2012.

Small molecule forcefields need work

Fennell, Mobley. J. Phys. Chem. B. 2014

Data access is killing forcefields

  • Forcefields should be consistent with all available data
  • Most datasets are heterogeneous, offline, and static

See also work by Wang, Pande, Swope, Case, MacKerell, Ponder, Best, Hummer, Bruschweiler.

WANTED: Reliable, machine-readable, open archive of physicochemical measurements!

NIST Thermodynamics Research Center

Data Capture at NIST/TRC: ThermoML

ThermoML is rapidly growing

Figure from Chiraco, J. Chem. Eng. Dat., 2013.

Can we leverage ThermoML for forcefield validation?

Density and dielectric constants as forcefield tests

  • Sensitive to nonbonded parameters
  • Simple ensemble average geometric interpretation

$$\rho = \langle \frac{M}{V} \rangle$$

$$\epsilon = 1 + \frac{4\pi}{3} \frac{\langle \mu \cdot \mu \rangle - \langle \mu \rangle \cdot \langle \mu \rangle}{V k_B T}$$

See also van der Spoel, JCTC, 2011 and Fennell, 2012.

How many measurements are there?

Munging the ThermoML with pyxb and pandas

Benchmarking neat liquid densities and dielectric constants

  • OpenMM 6.3
  • GAFF / AM1-BCC (Antechamber + OpenEye)
  • Converge each density to 0.0002 g / mL ($\approx$ expt. error)

PME + Langevin 1 fs + Monte Carlo Barostat + Fixed HBond Constraints + 1000 molecules per box

Densities are in the ballpark

Beauchamp et al, In Preparation.

Static dielectric constants are consistently underestimated

Beauchamp et al, In Preparation.

Fixed charges fail to capture polarizability

Observed: $\epsilon \approx 2.2$, Predicted: $\epsilon \approx 1.001$, $\Delta \Delta G_{solv} \approx$ 2 kcal / mol

Atom counting predicts molecular polarizability to within 2%

$$\alpha = 1.53 n_C + 0.17 n_H + 0.57 n_O + 1.05 n_N + 2.99 n_S + \\ 2.48 n_P + 0.22 n_F + 2.16 n_{Cl} + 0.32 $$

$$\epsilon_{corrected} = \epsilon_{MD} + 4 \pi N \frac{\alpha}{\langle V \rangle}$$

Sales, 2002

Empirical atomic polarizability corrections reduce bias

Beauchamp et al, In Preparation.

Continuous integration of forcefields

Live update stream now available at trc.nist.gov/RSS/

https://github.com/choderalab/ThermoPyL https://github.com/choderalab/LiquidBenchmark

Where do we go from here?

  • Scale up, real-time simulation, web frontend
  • Perform new experiments in automated wetlab
  • Bayesian (MCMC) forcefield / experimental design
  • Polarizable forcefields

Conclusions

  • Small molecule forcefields need help
  • ThermoML is a NIST-supported, machine-readable, and growing set of physicochemical data
  • We built a semi-automated benchmark of densities and dielectric constants in ThermoML
  • Empirical polarizability model improves comparisons to measured dielectric constants
https://github.com/choderalab/ThermoPyL https://github.com/choderalab/LiquidBenchmark

Funding and Acknowledgments

  • Julie Behr (MSKCC)
  • Patrick Grinaway (MSKCC)
  • Bas Rustenburg (MSKCC)
  • John Chodera (MSKCC)
  • Kenneth Kroenlein (NIST)
Also Vijay Pande, Lee-Ping Wang, Peter Eastman, Robert McGibbon, Jason Swails, David Mobley, Christopher Bayly, Michael Shirts, and the Chodera lab.