Words by Richard Bumann
Data Science Consultant at ERNI
Manage expectations, prove and challenge results, communicate outcomes. In each stage of your data science project, these are some of the unspoken expectations your team and leaders will have of you.
Navigating these challenges is not easy, so let’s take a closer look at what pitfalls to avoid, what tough decisions to make and whose engagement is the most important during each stage of a data science project.
The individual stages of your data science project as described below don’t necessarily happen in a strictly sequential order. You can move back and forth between them or repeat the cycle a number of times to tackle new challenges. Always make sure you take a conscious and well-founded decision when proceeding to the next stage, as with this change, you’ll have a change of the stakeholders themselves and their management.
Business understanding: Make sure all stakeholders understand the business goals of your project
During the initial stage, people should not only understand the benefits of the project but it’s also just as important to make them understand what the project cannot deliver. Be careful when defining and limiting the scope of the project.
Stakeholders in this phase:
Business end users, business analysts and data scientists.
Obstacles:
If the vision or idea of what should be achieved is too broad, it has to be narrowed down. Keep in mind that business end users and framing conditions are multifaceted and include more than just people, e.g., legal bodies or security regulations.
Difficult but correct decisions:
Abort the project if the business idea is not feasible or the benefits are not viable.
Engagement of stakeholders:
Illustrate the possibilities with well-designed examples and set realistic expectations.
Data understanding: Let’s discuss available data
In this phase, you’ll be mapping the data landscape and discussing data storage and possibilities to integrate and merge data. It is also important to assess the quality and completeness of the data.
Stakeholders of this phase:
Business analysts, data scientists or analysts, data engineer, IT.
Obstacles:
(a) Miscommunication between business analysts and data analysts; (b) miscommunication between data scientists and the data engineer, leading to poor identification of necessary and available data sources; (c) bad relationship between IT and the data team because of different goals; (d) missing out on opportunities to uncover poor data quality and data gaps.
Engagement of stakeholders:
Picture the benefits of the data project. When working with IT, engage them in the process so they don’t feel left out and you won’t end up with requests such as “extract data for us immediately”.
Data preparation: Getting the data in shape
In this phase, make sure you have the right data in the best quality possible.
Stakeholders in this phase:
Data scientists, data analysts and data team, business stakeholders.
Obstacles:
Incomplete or ‘dirty’ data, missing resources from the IT department, or missing engineering that could help access data and improve its quality.
What can go wrong:
(a) If the scientists don’t talk to the business and other professional stakeholders, they might miss some important facts needed to prepare and clean the data for a proper analysis; (b) if cleaning is needed, there might not be enough resources to clean data for the project (which should be done by professionals). Data teams need to include cleaning in their planning, and occasionally involve data engineers to clean data at the source.
Engagement with stakeholders:
Clearly communicate how important it is to have a clean database for a correct analysis. Assign resources from IT.