Skip to content

Befriending the Maybe

One thing you’ll notice about data science is its unpredictable nature. Imagine yourself as an inventor, where you know what you want to achieve but have no clear path to get there. It's a journey filled with trial and error, twists and turns, and plenty of unexpected challenges. Let’s dive into an example to illustrate this.

Embracing the Unexpected

Data science projects are like navigating through uncharted territory. You start with a clear goal, but the path is often murky. The key is to embrace ambiguity and be prepared for the unexpected. Let’s break down a project to see this in action.

The Project: Identifying High Maintenance Costs

Picture a factory with various operations, each incurring maintenance costs. The task is to identify which operations had over-budget maintenance costs in the previous year. Sounds straightforward, right? Just compare costs. But as you’ll see, it’s not that simple.

You might think the solution is simple: compare high and low maintenance costs. But jumping in too quickly without thorough understanding leads you in circles. You find out the obvious – expensive projects have high costs, and cheap ones have low costs. This approach doesn’t provide any new insights.

The first step in any data science project is defining the problem accurately. This involves slowing down and thinking through the details before jumping into solutions. Referencing frameworks like CRISP-DM (Cross Industry Standard Process for Data Mining) can be helpful. This approach emphasizes understanding the business and data before diving into modeling.

Collaborating with Subject Matter Experts

Collaboration is crucial. Even if you’re working alone, bouncing ideas off colleagues or bosses can provide new perspectives. In this project, involving subject matter experts (SMEs) was key. The finance team seemed like the obvious choice since they deal with budgets. But there was a twist – the operations within the factory didn’t have individual budgets, only the whole factory did.

The Setback and Persistence

Hitting a roadblock is common. In this case, the lack of individual budgets seemed like a dead end. The easy way out would be to do the best with the available data. But persistence is important. So, another group of SMEs, the chemical engineers, was consulted. They were busy professionals volunteering their time, often without their bosses' knowledge, highlighting the importance of being respectful and prepared for these meetings.

Finding an Alternative Solution

Working with the chemical engineers, a creative solution emerged. Without individual budgets, attention turned to the available data – a parts list. The idea was that similar parts lists might indicate similar functions and maintenance needs. If the equipment functions similarly, their maintenance budgets might be comparable. This unconventional approach turned out to be effective.


Embracing ambiguity in data science means being ready for unexpected challenges and willing to explore new, creative solutions. It's a journey that requires patience, collaboration, and a willingness to think outside the box.