Michael Schroeder

Michael Schroeder
ERNI Switzerland

Introduction

Data quality is a crucial factor in data-driven projects, and its significance is closely tied to the nature of the research question at hand. Achieving good data quality is a relative concept that must always be evaluated in relation to the requirements of the data-driven model. Data and the model should be seen as tandem, with their characteristics often described using the four Vs:

Volume

This refers to the quantity of data available. In our project, high-resolution satellite images covering the entirety of Switzerland amount to over 10 TB of data, which incurs significant storage costs. To address this, we have made the decision to reduce the image resolution. Processing these individual images and aligning them to the desired resolution and position necessitates the use of appropriate big data tools.

Variety

Different data sources and types present challenges in data-driven projects. In our case, we integrate various data sources, including satellite images (in raster format), road accident data (spatial and temporal points), road networks (spatial lines), and traffic load data (sparse spatial and temporal data). Conceptual competence is required to combine these different data types effectively and assess their relevance to the original question.

Velocity & Veracity

Also play crucial roles in ML-based projects. Ensuring the timely processing and adaptation of data as well as assessing the accuracy and reliability of the data are essential aspects that impact the overall success and performance of the ML algorithm.

 

Data metric definition

Once the research question is clear and the data has been cleansed, the next step is to effectively utilise the data. Many projects encounter difficulties at this stage, either by choosing an algorithm that does not align with the data or by selecting one that cannot address the research question adequately. Therefore, we need to define a risk metric applicable to the project based on the available data.

Not all accidents are equal; the severity of an accident differs. To account for this, we have normalised the number of accidents based on factors such as road infrastructure and traffic volume. Additionally, accidents resulting in severe injuries or fatalities carry greater weight. By considering these factors, we have defined a risk index for each road segment, which we subsequently categorized into five levels, serving as our target variable.

 

Algorithm development

After defining our target variable, the next step is to choose a suitable algorithm. There are no strict guidelines for this decision, as it varies case by case. Depending on the problem’s complexity, involving an experienced data scientist is crucial. The choice of algorithm class (e.g., regression, classification, clustering) should align with the defined research question, while model selection and parametrisation require expertise and experience. It is important to ensure that the model has an appropriate size (e.g., number of parametres) to distinguish artifacts from genuine statistical patterns. Furthermore, mitigating systematic biases in data distributions between training and testing data during model operation poses an additional challenge.

 

Conclusion

Data quality is paramount in data-driven projects, and its assessment must be tailored to the specific question at hand. Overcoming challenges related to data quality, volume, variety, and algorithm selection is vital for successful project outcomes. By addressing these challenges in our case study, we strive to develop an effective algorithm that can leverage diverse data sources and provide meaningful insights for decision-making purposes.

 

News from ERNI

In our newsroom, you find all our articles, blogs and series entries in one place.

  • 27.09.2023.
    Newsroom

    Unveiling the power of data: Part III – Navigating challenges and harnessing insights in data-driven projects

    Transforming an idea into a successful machine learning (ML)-based product involves navigating various challenges. In this final part of our series, we delve into two crucial aspects: ensuring 24/7 operation of the product and prioritising user experience (UX).

  • 13.09.2023.
    Newsroom

    Exploring Language Models: An overview of LLMs and their practical implementation

    Generative AI models have recently amazed with unprecedented outputs, such as hyper-realistic images, diverse music, coherent texts, and synthetic videos, sparking excitement. Despite this progress, addressing ethical and societal concerns is crucial for responsible and beneficial utilization, guarding against issues like misinformation and manipulation in this AI-powered creative era.

  • 01.09.2023.
    Newsroom

    Peter Zuber becomes the new Managing Director of ERNI Switzerland

    ERNI is setting an agenda for growth and innovation with the appointment of Peter Zuber as Managing Director of the Swiss business unit. With his previous experience and expertise, he will further expand the positioning of ERNI Switzerland, as a leading consulting firm for software development and digital innovation.

  • 16.08.2023.
    Newsroom

    Unveiling the power of data: Part I – Navigating challenges and harnessing insights in data-driven projects

    In this series of articles (three in total), we look at data-driven projects and explore seven common challenges that arise during their execution. To illustrate these concepts, we will focus on one of ERNI’s latest project – GeoML, dealing with the development of a machine learning algorithm capable of assessing road accident risks more accurately than an individual relying solely on their years of personal experience as a road user, despite limited resources and data availability.

     

  • 09.08.2023.
    Newsroom

    Collaborative robots revolutionising the future of work

    The future of work involves collaboration between robots and humans. After many years of integrating technology into work dynamics, the arrival of collaborative robots, or cobots, is a reality, boosting not only safety in the workplace but also productivity and efficiency in companies.

  • 19.07.2023.
    Newsroom

    When the lid doesn’t fit the container: User Experience Design as risk minimisation

    Struggling with a difficult software application is like forcing a lid onto a poorly fitting container. This article explores the significance of user experience (UX) in software development. Discover how prioritising UX improves efficiency and customer satisfaction and reduces risks and costs. Join us as we uncover the key to successful software applications through user-centric design.

  • 21.06.2023.
    Newsroom

    How does application security impact your business?

    With the rise of cyber threats and the growing dependence on technology, businesses must recognize the significance of application security as a fundamental pillar for protecting sensitive information and preserving operational resilience.

  • 07.06.2023.
    Newsroom

    How companies master transformation: Why a transformation manager is indispensable

    Ready for a taste of success? Transformation is brewing in the business world, and it’s time to embrace it. But navigating through uncharted waters can be a daunting task. Fear not! A transformation manager is a secret ingredient you need to navigate through the storming waters. Want to learn more about how a transformation manager can help you? Keep reading!