I recently had the great honor of attending my eldest son’s Ph.D. thesis defense. It was one of those mommy moments — you know, “my son the doctor.”
Although I’d discussed my son’s work with him for years, I didn’t expect to understand much of the advanced chemistry or protein engineering parts of his thesis defense presentation. But, I was surprised to see that one element of his talk was exceedingly familiar.
He explained something about metabolic pathways and a successful manipulation of an enzyme to make it function in a new pathway and context; the kind of scientific first that leads Caltech to give a guy a Ph.D. But, he was also interested in improving the method so it could be more widely used.
He showed a diagram of a complex and fragile process. He explained that it’s slow, painstaking, error-prone, difficult to reproduce, and requires expert knowledge. He wanted to make the process easier, so more people could do it more quickly and reliably. Now, that’s quite familiar.
And, it was followed by this slide. I have one like it in one of my decks.
And, this one. Again, familiar, except for the mutants.
This is a world that I know very well.
It turns out that one problem in protein engineering is an enormous library of options that cannot all be investigated. To narrow the options to the most likely candidates for success, my son studied the intrinsic rules that he and other experts in his subspecialty use to prioritize among options and identify candidates that are most likely to be successful. The result was a heuristic — a collection of very well informed intuitions about the relationships between protein structure and function.
Fortunately, my son learned a bit of C and Windows PowerShell before heading to grad school, which made it very easy for him to learn Python. He’s also a math whiz, although I can’t take any credit for that.
He translated the scientific method that the experts use, their combined heuristic, into an algorithm; a series of Python conditionals (IF statements) with vector math (from several sophisticated math modules) that examines the physical structure of a target protein. The Python script filters a huge, intractable library of options down to a few candidates that are most likely to be successful. His validation with a representative test set of proteins showed that the method was very effective. So effective that my son’s thesis adviser, an eminent scientist, declared in her introduction that he had “solved the problem.”
The result is a classic automation. A Python script back end with a web page front end. It works quickly and reliably. You still need to be a scientist to use it, but you don’t need to be an expert in this subspecialty. When it completes, you need a biochemist and classic method to test the options that it recommends, but the automation turns a ridiculously expensive and immersive task into a sane and practical one.
After the champagne and several parties, I asked my son, (yes, my son, the doctor), about the technique. “It’s basically what you do, Mom, applied to science.”
June Blender is a technology evangelist at SAPIEN Technologies, Inc. and a Windows PowerShell MVP. You can reach her at email@example.com and follow her on Twitter at @juneb_get_help. Thanks to Jackson K.B. Cahn, Ph.D. for his cooperation and permission to use his slides.