Engineering bioinformatics in seconds, not hours

Cross posted on the Color blog.

10,000 Year Clock, Long Now

It was winter 2014. Pharrell had just dropped Happy, the Rosetta probe landed on a comet, President Obama was opening diplomatic relations with Cuba

…and here at Color, the bioinformatics team had a problem. Our pipeline — the data processing system that crunches raw DNA data from our lab into the variants we report to patients — was slow. 12 to 24 hours slow.

This wasn’t a problem in and of itself — bioinformatics pipelines routinely run for hours or even days — but it was a royal pain for development. We’d write new pipeline code, start it running, go home, and return the next morning to find it had crashed halfway through because we’d missed a semicolon. Argh. Or worse, since we hadn’t launched yet, our live pipeline would hit similar bugs in production R&D samples, which would delay them until we could debug, test, and deploy the fix. No good. Continue reading


Building a public research database out of spare parts

Cross posted on the Color blog.

A couple months ago, we launched a public research database with DNA, health history, and more from 50,000 of our clients. You might be surprised at how little work it took us: under four person-months total. Read on to hear how we designed and built it, went above and beyond the usual privacy safeguards, and did it all in the blink of an eye.

At first glance, Color Data may seem far from unique. ClinVar, gnomAD, TCGA, 1000 Genomes, and others all address similar goals: sharing anonymized genotype and phenotype data with academic researchers to help them advance science and knowledge. We’re in an unusual position at Color, though, in that we have a large population with both sequenced DNA and self-reported phenotype that has opted to share it with researchers. Even better, our population is a bit more diverse across ethnicity, age, health history, and other characteristics than many other research datasets. Continue reading



We lost our faithful cat Snoopy a few weeks ago, just before the new year. He’d been sick for a while, technically kidney failure and feline mast cell cancer, really just old age. The vet gave him just six weeks, but we nursed him along with steroids and fluids, and he managed a good five or six months beyond that. In the end, it was his time, but it was still tough to let him go. We miss him.

Gina got Snoopy and Charlie at the same time, barely after they were weaned. They’d lived together at the shelter, and she’d only planned to get one, but she couldn’t bear to break them up. They were as close as brothers; for all they knew, they were brothers.

Snoopy constantly groomed Charlie and looked after him, but there was always only one true love of Snoopy’s life: Gina. She was his mama. He followed her around the house, sat on her legs while she worked from home, kept her company while she gardened and cooked, and slept on top of her in bed at night. He was her fast companion, her kitten, her buddy. Continue reading


Goodbye Facebook, Goodbye Google+

I deleted all of my Facebook posts last week. I deleted my Google+ posts too. They were pretty much all posted here on my web site too, so nothing was truly lost, but I still feel a bit lighter, somehow.

Plenty of ink has been spilled on the problems with big social media and the companies behind it. There’s an entire movement of people leaving social networks for various reasons. Many of them have expressed their concerns, often quite loudly and eloquently, so I don’t really need to repeat them here. Consider yourselves lucky. Continue reading


Stop paying the ETL tax

Also on the Color blog.

I want to say one word to you. Just one word. Are you listening? … ETL.

Let me guess: that didn’t set your imagination on fire. Even in software engineering and data science, it’s not exactly a household term. Nor are the more modern terms data platform or data engineering. If you do know what they are, chances are you don’t have strong opinions. You know they’re out there, people do them, and that may be the end of it.

ETL stands for Extract, Transform, Load. It’s how you get your data from your primary OLTP database, which serves your application, into an OLAP data warehouse designed for analysis, business intelligence, and data science.

Whatever your product is, it’s hopefully a core competence for your company. It’s a key differentiator. For many of us, data science and analysis are also key differentiators. ETL, however, is not. It looks basically the same everywhere, and does basically the same thing. These are all signposts that generally point toward buying or reusing, not building from scratch. Doing this kind of thing yourself just won’t move the needle. Continue reading


Good technology

My electric toothbrush is good technology.

It has one button. The button turns it on. It vibrates for 30 seconds, buzzes, then repeats three more times. It has no other controls.

It works one way: the standard, ADA recommended way. That’s what most people want. If I want to brush longer, I restart it when it’s done. If I want to stop early, I press the button again.

It has one display, a LED battery indicator. The LED has three parts: low, medium, high. The only other way it communicates is by buzzing.

It serves a purpose. Its ultrasonic vibrations clean my teeth better than manual brushing. It’s also easier to reach every tooth surface when I’m not physically brushing.

It is not configurable. It has no screen. It is not smart. It has no WiFi or Bluetooth. It has no app. It is not a platform. All it has is one button, one LED, and one mode of operation.

My electric toothbrush is good technology. More technology should be good technology.


I don’t hang out on the internet

I use Facebook. Not a ton, but I use it. I tweet, I Instagram, I read blogs. I do much of my work on GitHub. I’m on mailing lists, IRC channels, StackOverflow. Not LinkedIn, but that’s an exception. I say all this to show that I spend plenty of time on the Internet. More than my fair share.

And yet. If I hang out with people on the Internet, I generally already know them in real life. This puts me a bit at odds with online communities like open source, the IndieWeb, and others. I participate in them now and then, but I sometimes find it hard to relate to their needs and interests. They’re online communities, and I don’t really…commune…online.

This is not remarkable. For most people, it’s actually the norm, although that’s changing quickly as the more and more of the world gets online. We’re well past the halfway point! It’s a bit unusual for nerds like me, though, since discovering the internet has long been a rite of passage for us. Continue reading


How Buildings Learn

I just finished Stewart Brand‘s How Buildings Learn, a thought provoking and deeply inspiring book. I’ve been a fan of Stewart’s for a long time, through the Whole Earth Catalog, The Well, Long Now, and de-extinction, and his writing doesn’t disappoint. I’m not particularly interested in architecture, construction, or interior design, but he manages to bypass all three and find homespun philosophy and powerful insights in topics as mundane as siding materials and gardening techniques. Highly recommended.

Here are a few favorite quotes. On why buildings change (page 238):

The three things that change a building most are markets, money, and water. If you would ensure a building’s longevity, protect it from markets and water, and feed it money, but not too much and not too little.

On as-builts (page 239):

As-builts are building plans that show in detail exactly what was built, which is always significantly different from what was in the original plans. Without accurate as-builts, says Chuck Charlton, “An electrical failure can have you wandering through the building shotgunning circuit breakers and shinnying down the chases.” … If the as-builts aren’t updated constantly, each bit of repair or remodeling, each new contractor, each change of property management makes the plans more misleading.

On state-owned property (page 163):

When the landlord is the state, as it was in communist lands, you get the ultimate in negative maintenance. All visitors to the mortally rundown buildings of Eastern European nations have tales like Brian Eno’s: “My wife and I were checking in to a hotel in Moscow. Our host showed us to our room, and began switching on the lights. As he turned on the one by the door, a great tongue of flame issued forth from a light fitting in the ceiling. He calmly switched it off again and said, ‘Don’t use that one.” Since no one owned the light, why should anyone fix it? A command economy displaces responsibility even further outside the building than a market economy does.

On getting rich quick (page 165):

“People want to get rich quick.” The other side of the coin is, Go broke quick. Real estate is the classic case of soar and collapse, of tycoons going bankrupt and taking shortsighted banks with them. Work done in haste is necessarily shoddy, a house of cards. On a go-fast schedule there is no margin for a single error, and error is inevitable. High risk, high loss.

The opposite strategy is much surer, because the errors are piecemeal and correctable. When you proceed deliberately, mistakes don’t cascade, they instruct. Low risk plus time equals high gain. This strategy treats the fundamentals of the living investment with attention and respect. The lesson of realty laced with reality is: “Get rich slow.”

On temporary fixes (page 369):

Beware: in the real world “temporary” is permanent most of the time. If the cheap trial worked, it will be left alone, no matter how funky it is. If it failed, it’s embarrassing to fix. Life rushes on to more pressing or interesting problems.

On HOAs (page 153):

…new communities seek to pre-empt any such adaptivity by repressive, fiercely enforced “covenants, conditions, and restrictions.” These are the dread “CC & Rs” that homeowners’ associations use to control such details as what colors you may paint your house, what pets (and in some cases what children) you may keep, how your lawn will look, your roof, your fence, your driveway (no campers, trucks, or car repair), your backyard (no drying laundry or unstacked firewood). Any neighbor might report you. What if you ignore or defy such rulings? The homeowners’ association can take your house or send you to jail. Joel Garreau points out that these organizations have all the powers of government—the ability to tax, to legislate, and to police—without the usual restrictions of democratic representation or being answerable to the US Constitution.

Garreau contrasts a new development such as Irvine, California, to the once-deplored original Levittowns that were created for postwar families back in 1949: The old Levittowns are now interesting to look at; people have made additions to their houses and planted their grounds with variety and imagination. Unlike these older subdivisions, Irvine has deed restrictions that forbid people from customizing their places with so much as a skylight…Owners of expensive homes in Irvine commonly volunteer stories of not realizing they had pulled into the driveway of the wrong house until their garage-door opener failed to work.

This degree of institutionalization of real estate value over use value is odious enough as an invasion of privacy, but it also prevents buildings from exercising their unique talent for getting better with time.


Pessimistic induction

One of my favorite ideas from recent memory is pessimistic induction. As usual, a quick search finds plenty of smarter people who thought of it before me and easily refuted it. Even so, it’s oddly compelling, a gem of a misconception.

Looking back at history, most of our ideas, even the best ones, have turned out to be wrong. Very few have stood the test of time. Newtonian physics, the rational economic actor, and fat vs sugar are just a few famous examples.

It’s easy to think that we’re at the final culmination of our entire historical arc of science, art, and civilization. We may have been wrong in the past, but we’ve rooted out our mistakes, corrected them, and we now have everything figured out. It’s such a common misconception that it has its own name: the end of history illusion.

That name is apt, though: it really is just an illusion. The present day may feel special, but it’s usually just like every day before it. Many things are good, some things are bad, and our currently accepted scientific ideas are almost certainly wrong, to one degree or another. This is the pessimistic induction.

It’s chillingly elegant, but it has a fatal flaw. Yes, today’s best science may likely be wrong, but right and wrong are rarely black and white. Modern physics is famously incomplete, but working physicists would still say that the standard model and string theory and holographic universe are better ideas than Newtonian physics. We may not be perfectly right about everything, or maybe even anything, but we’re probably more right than we used to be.

That’s a comforting thought. The universe may be cold and indifferent, but we can admit our flaws and still hew ever closer to understanding. Onward.


What I work on

I had a conversation with a good friend recently that crystallized something I’d always felt strongly, at a gut level, but never thought through: how I choose what to work on.

When I look for a new job, I think about project, people, compensation, role, company, commute, etc. I’ve tried focusing on different factors over time, and I’ve found that for me, project is often the most important. I’ll suffer with low pay, long train rides, or a role I’m overqualified for if I’m working on something I care about and believe in.

I prefer tools over products. Systems over tools. Protocols over systems. Problems over users. Wicked over tame. Research over application. Many of these are stereotypical engineer cliches, but they boil down to an interesting theme: I prefer to work in areas where the goals and incentives don’t change much over time.

I don’t know where I developed this tendency toward the long term, but it’s a big personal motivation. The time scales I’m thinking about are centuries and millenia, not years or decades. I could just as well replace time with generations. I’m fine with not shipping code often, or not making any progress for longer stretches, if I know the problem will still be around and my work will still apply down the road.

What does this mean? Well, scratch most products – consumer, enterprise, or other. Some of them last centuries, but not many. Scratch applications and services in general. I’m happy to do work that’s used in a product or service, but usually only if there’s an underlying problem with a longer lifespan.

The two main areas that fit are research and infrastructure. Academic departments and conferences rise and fall, but the central goal of research has stayed the same forever: pursuit of truth, knowledge, and understanding. That won’t change anytime soon.

Infrastructure, on the other hand, is worlds removed. Construction workers in hard hats on building sites don’t overlap much with tweedy professors in ivory towers. They do have one thing in common, though: their goals are consistent over time. If you want to cross a river today, you build a bridge, just like a thousand years ago. We still need roads to get from one place to another. Plumbing to carry water and sewage. Electricity and communication grids may be newer, but we’ll need energy and communication in a thousand years just like we do now.

When I look at the projects I’ve enjoyed most in my career, they fit the bill. Sharding databases and later Paxos etc: classic infrastructure. Networking: infrastructure and applied research (OpenFlow). Color Genomics: applied research. App Engine: infrastructure as a product. Even the side projects I’m looking into now fit: climate change, p-hacking, the reproducibility crisis.

Why do I care if goals change over time? I’m not sure. Some of it may be the natural human desire to leave a legacy. If I work on big, long standing problems, I’m more likely to be remembered after I die. I don’t spend much time thinking about legacy, but it could still be lurking in my subconscious.

A modern variation is “changing the world.” It’s a well worn phrase here in Startupland, but for me personally, it’s always seemed hopelessly ambitious. I have no illusions that I’m personally going to change the world in any significant way. Maybe a little, if I’m lucky, but not a lot.

Another Silicon Valley buzzword is “impact.” Everyone wants to work on something impactful. Most people use it to mean a bold new product, or a big user base, or innovating and disrupting an industry. I want to have impact, sure, but I want to do it by moving the needle on a big, important, long term problem. Growth hacking and TechCrunch coverage aren’t part of my personal equation.

Research and infrastructure aren’t unique. There are plenty of other areas where goals and incentives stay the same over time. Art, clearly. Philanthropy, education, entertainment, health care, public policy…the list goes on. I’d get restless if I was a teacher or actor or nurse and didn’t do anything new, but there’s plenty of opportunity to push on big problems in those fields on the front lines. I’d hate being a campaign manager, but I could easily do a stint as a policy wonk at a think tank.

This may not mean much to you, or even to me. After all, it was guiding my career decisions long before I thought it through and wrote it up. Still, now I know…and knowing is half the battle!

Don’t get the wrong idea, I’m still loving it at Color Genomics! I’m not going anywhere. On the contrary, we’re actively looking for good people. If you want to work on something meaningful and challenging, drop me a line!

Also, scratching my own itches is one big exception to this rule. If software is the tool of the knowledge worker, I’m lucky to be a toolsmith. I’ve written and modified lots of software over the years to solve my own problems. Some took significant time and effort, like granary and P4, and some have real user bases, like Bridgy and huffduff-video.

Even so, these tools have always felt practical, utilitarian, even a bit disposable. I don’t consider them a big part of my career or life’s work. I won’t need them forever, and they’ll all grow old and die eventually. That’s OK.