Apocalypse…you see that, and you know….I’m just going to lay down and die. I’m not gonna thrive in that environment! I type for a living. I’ll be food for other, stronger men.
The invention of theories depends on our talents and other fortuitous circumstances such as a satisfactory sex life.
…and here at Color, the bioinformatics team had a problem. Our pipeline — the data processing system that crunches raw DNA data from our lab into the variants we report to patients — was slow. 12 to 24 hours slow.
This wasn’t a problem in and of itself — bioinformatics pipelines routinely run for hours or even days — but it was a royal pain for development. We’d write new pipeline code, start it running, go home, and return the next morning to find it had crashed halfway through because we’d missed a semicolon. Argh. Or worse, since we hadn’t launched yet, our live pipeline would hit similar bugs in production R&D samples, which would delay them until we could debug, test, and deploy the fix. No good. Continue reading
A couple months ago, we launched a public research database with DNA, health history, and more from 50,000 of our clients. You might be surprised at how little work it took us: under four person-months total. Read on to hear how we designed and built it, went above and beyond the usual privacy safeguards, and did it all in the blink of an eye.
At first glance, Color Data may seem far from unique. ClinVar, gnomAD, TCGA, 1000 Genomes, and others all address similar goals: sharing anonymized genotype and phenotype data with academic researchers to help them advance science and knowledge. We’re in an unusual position at Color, though, in that we have a large population with both sequenced DNA and self-reported phenotype that has opted to share it with researchers. Even better, our population is a bit more diverse across ethnicity, age, health history, and other characteristics than many other research datasets. Continue reading
More 4 year old quotes:
“71 is weird because the 1 is like boop and the 7 is like boop boop!”
“Have you ever heard of hard socks?”
“Wow, no. What are they? Are they like shoes?”
“No no no. Ok let me explain it to you. They’re flappy but they don’t flap. You wear them when you ride a bicycle but only at night.”
“Whoa. Have you ever worn hard socks?”
“Of course not. But I really want to someday!”
Technically correct: the best kind of correct.
Brooke: “OK so this is not really a good question Mama.”
As a computer scientist, I love it. It’s a fundamental breakthrough: the first open participation distributed consensus algorithm ever. That’s a big deal.
As an engineer, I can’t responsibly recommend it to anyone for anything. It’s slow, unreliable, immature, hard to use, and functionality impoverished. It’s a database that hates you.
We lost our faithful cat Snoopy a few weeks ago, just before the new year. He’d been sick for a while, technically kidney failure and feline mast cell cancer, really just old age. The vet gave him just six weeks, but we nursed him along with steroids and fluids, and he managed a good five or six months beyond that. In the end, it was his time, but it was still tough to let him go. We miss him.
Gina got Snoopy and Charlie at the same time, barely after they were weaned. They’d lived together at the shelter, and she’d only planned to get one, but she couldn’t bear to break them up. They were as close as brothers; for all they knew, they were brothers.
Snoopy constantly groomed Charlie and looked after him, but there was always only one true love of Snoopy’s life: Gina. She was his mama. He followed her around the house, sat on her legs while she worked from home, kept her company while she gardened and cooked, and slept on top of her in bed at night. He was her fast companion, her kitten, her buddy. Continue reading
Don’t solve a problem just because you can imagine it. Wait until there is a problem, then go after it. If you over-anticipate, you will design freedom out of the system.
- Stewart Brand
I deleted all of my Facebook posts last week. I deleted my Google+ posts too. They were pretty much all posted here on my web site too, so nothing was truly lost, but I still feel a bit lighter, somehow.
Plenty of ink has been spilled on the problems with big social media and the companies behind it. There’s an entire movement of people leaving social networks for various reasons. Many of them have expressed their concerns, often quite loudly and eloquently, so I don’t really need to repeat them here. Consider yourselves lucky. Continue reading
Let me guess: that didn’t set your imagination on fire. Even in software engineering and data science, it’s not exactly a household term. Nor are the more modern terms data platform or data engineering. If you do know what they are, chances are you don’t have strong opinions. You know they’re out there, people do them, and that may be the end of it.
ETL stands for Extract, Transform, Load. It’s how you get your data from your primary OLTP database, which serves your application, into an OLAP data warehouse designed for analysis, business intelligence, and data science.
Whatever your product is, it’s hopefully a core competence for your company. It’s a key differentiator. For many of us, data science and analysis are also key differentiators. ETL, however, is not. It looks basically the same everywhere, and does basically the same thing. These are all signposts that generally point toward buying or reusing, not building from scratch. Doing this kind of thing yourself just won’t move the needle. Continue reading