Feb 20, 20266 min read

Data Engineering Runs on Judgment Calls

Two projects, two completely different domains, same lesson. The actual work was sitting in ambiguity and making decisions I could defend. The SQL took care of itself.

data-engineeringetlcareerhealthcare

The actual work of data engineering, at least from what I have seen, is making judgment calls.

You make a lot of them on a single project. Each one changes what the numbers look like downstream. The SQL matters, of course, but that is usually the easier part. Joins are joins. You learn the dialect, figure out a few optimizer quirks, and keep moving.

The harder part is when someone gives you a dataset, a loose business question, and no clean definition of what “right” means.

I worked on two projects over the last couple years that made this clear to me. They were in completely different domains, with different stakeholders, but the same basic pattern showed up in both: the person building the data model ends up making decisions that shape what everyone else sees.

"You'll figure it out"

The first project was a research analytics dashboard built on a claims database with around 40 million patients. The goal was to track treatment patterns for a specific condition.

There was no formal spec. No requirements document. No approved drug list. I had a connection string and a broad objective.

The first real question was simple, but not easy: who counts as a patient?

Diagnosis codes are messy. Does one diagnosis qualify someone? Do they need two? Does the setting matter? A hospital visit and an outpatient visit do not always mean the same thing. The research team had opinions, but the criteria had never been written down because no one had needed to define it at that level before.

So I made the call.

A patient qualified if they had at least two diagnosis codes in a rolling 12-month window, or one diagnosis code plus a condition-specific prescription fill. I ran the counts, compared them against published prevalence rates for the population, and the numbers were in a reasonable range.

That became the standard for the project.

There was no answer key. The best I could do was make a decision that was clear, explainable, and backed by evidence. In work like this, “defensible” matters a lot.

Building a drug list from scratch

The dashboard was meant to show what drugs patients were taking and in what order. To do that, I had to build the drug list myself.

I started with what I knew about prescribing in the U.S., then mapped that to a foreign market where brand names were different, some drugs were not approved, and several common treatments were ones I had never heard of before. I ended up with about 30 molecules across acute and preventative categories.

That split created another problem.

The same molecule can mean different things depending on how it is prescribed. A triptan taken as needed is usually acute. A beta-blocker taken daily is usually preventative. The source data had a flag for this, which sounded great until I looked at the actual values. It was populated maybe 60% of the time, and even then it was not always reliable.

So I built rules based on fill patterns.

If a patient filled a prescription consistently, especially more than once every 30 days, I treated it as likely preventative. If fills were sporadic or clustered around short windows, I treated it as more likely acute. When the data was still ambiguous, I defaulted to how the drug is most commonly described in the clinical literature.

This is also how I ended up reading clinical papers late at night trying to understand prescribing behavior. Slightly outside the original job description, but necessary. You cannot build classification logic for something you do not understand.

Every decision changes the numbers

Another question came up around lines of therapy.

If a patient takes Drug A for six months, switches to Drug B for three months, and then goes back to Drug A, how many lines of therapy is that?

I counted it as two.

My reasoning was that returning to a drug the patient had already tried was not really a new treatment strategy. It was a re-initiation. Collapsing those cases also made the survival analysis cleaner and easier to explain.

Then came discontinuation.

If there is a 45-day gap between prescription fills, did the patient stop therapy? What about 60 days? Or 90? Claims data gives you fill dates, but it does not tell you what the doctor said or what the patient actually did at home.

I set the cutoff at 60 days, with a grace period based on days supply.

Every one of these decisions moved the numbers. Patient counts changed. Duration changed. Treatment sequencing changed. None of the choices were perfect. The point was to make the assumptions visible enough that someone could challenge them.

The dashboard that ran on tribal knowledge

The second project was different. I was rebuilding a legacy BI dashboard for a pharma sales team, with about 200 people using it every week.

The old tool had separate views that did not really talk to each other. Sales by one dimension. Sales by another. Geography in a separate place. Each view made sense on its own.

The new dashboard needed everything in one data model.

The moment you do that, every old inconsistency becomes visible. A total for one physician does not match between two views. A regional number looks off. A completion metric disagrees with the activity count.

Most of these issues were not new. The old tool just made them harder to compare.

The business rules behind the dashboard lived in a few people’s heads. Discovery became a loop: build the model, run the numbers, compare against the old dashboard, find a mismatch, ask why, get an incomplete answer, keep asking until someone who had been around long enough remembered the rule.

Then I would implement the rule and find the next mismatch.

One example was physician interactions. If a rep visited one physician and also spoke with three other attendees, the dashboard displayed four interactions. But for completion percentage, that same visit counted as one completed activity.

Same event. Two different counts. Both correct, depending on the metric.

I found it because two sections of the dashboard did not match, then spent about a day and a half tracing the difference back to that rule.

Being the only data person

I was solo on the data layer for that project. Every data question came to me. There was no one else reviewing my SQL or checking my joins.

When the numbers were wrong, there was one person to look at. When they were right, no one noticed. That is pretty much how reporting works.

After a few bad refreshes went live and caused confused emails from the sales team, I built QC gates into the process.

Before any refresh hit production, automated checks had to pass. YTD sales should not go down. Month-over-month changes should stay within historical ranges. Row counts should not drop sharply without a known reason. If a check failed, the refresh stopped and the dashboard kept showing yesterday’s data.

The checks themselves were not complicated. Any data engineer could write them.

But deciding to block production behind them was the important part. I had to choose that stale data was better than wrong data. That was probably the best decision I made on the project.

The thing that caught me off guard about both projects is how little of the hard work was purely technical. You spend years getting better at SQL, Python, and statistics, and then the job ends up being mostly about sitting with ambiguity and making calls you can defend when someone pulls on the thread.