Data on its own is meaningless. What matters is how the data you collect is analysed and evaluated, and which of your business processes and products can be improved through data.
ERNI helps many customers that are leaders in their industries. The most pervasive data trend that can be observed across all sectors is the collection of data focused on value (smart data), rather than sheer volume. You can sometimes stumble upon something meaningful or discover an occasional gem by drilling down into unstructured, large sets of data. However, some of the most successful data projects we’ve participated in do the exact opposite.
Data science initiatives can cover a very broad spectrum. At the lower end of expectations, you can simply aim to improve your processes in order to introduce fact-based and informed decision making to fuzzy areas. On the higher end of data science initiatives, you can kick-start major product innovations, introduce new business models and even venture into unchartered business territories and new industries (but ERNI makes sure you have a scalable and sustainable solution).
This article showcases some of the recent data projects we’ve worked on. For newcomers to data science, this is a useful starting point that will help you navigate the possibilities of data analysis or AI and cognitive services. For seasoned data practitioners, we aim to introduce expertise and inspire new ideas and projects and foster discussion.
Whatever your background, look at the value generation process as a whole and understand your data from the perspective of business processes and value chain management. This way, you’ll make sure every data point you collect actually matters.
1. Prediction of Air Quality Based on Traffic
ERNI was asked to use current road traffic information in order to provide a weather station with predictions about air quality (particulate matter PM10).
The biggest issue was collecting up-to-date traffic data from a reliable source. Our team evaluated numerous approaches on how to separate road traffic from secondary data (e.g., delays, traffic jams or historic counts).
The information collected was combined to generate a local traffic rating. In order to establish a model that could predict air quality, we used tools such as Microsoft Azure Machine Learning and Google TensorFlow, along with pandas and Python.
Results: As a pilot study, this model accurately predicts air quality in a vicinity of 10 km and our cloud-hosted predictor facilitates simple integration into the weather station’s software or alternative applications.
2. Operating 10 000 instruments around the world through smart use of data
Our customer sells analytic instruments to laboratories all over the world. Its customers expect no less than minimal downtime, proactive maintenance, flexible support and robust data security. Maintaining and operating close to 10 000 instruments worldwide is a highly demanding task for this customer. We assisted in building a state-of-the-art infrastructure for predictive maintenance and operational excellence.
In the beginning, we captured and standardised all worldwide maintenance processes to the widest extent possible.
2. Requirements engineering
The standardisation allowed us to derive concise requirements for a data collection platform that aggregates and centrally stores valuable data from instruments located all over the world.
Several pilot studies on predictive maintenance were carried out to determine the most beneficial scenarios that would guarantee operational excellence and simplify our customers’ maintenance processes. A machine learning and artificial intelligence platform is being established as the next step. The platform will allow prompt local analysis of instruments and enable central processing using the complete data set.
3. Predicting failures in Semiconductor Manufacturing
The next customer is a global manufacturer of semiconductors. Its product, chips, are sent out in reels of 5 000 – 10 000 pieces, and if there are too many faulty pieces per production lot, the whole reel is rejected by the buyer. That can be a big issue for the manufacturer as the cost of the raw material is very high and the margin is low. On top of that, rejections are bad for a company’s reputation.
Error detection in semiconductor manufacturing is based on simple statistics. Our customer gave us the task of inspecting the results of the final tests carried out on individual chips to detect the lots/ reels with a high probability of rejection.
We began with a thorough analysis of available literature and with an assessment of data quality. The tight deadline meant we distributed the analysis among three international teams and synchronised them in an iterative and agile manner. Every week, they were to exchange problems and findings.
The teams made use of the different analytic tools and approaches. Some of the tools applied were Microsoft Azure Machine Learning Studio, Scala and Apache Spark, RapidMiner, KNIME and Dataiku. On the algorithmic side, we used simple statistics, k-means clustering, artificial neural networks, tree ensembles for classification and different types of anomaly detectors.
When the analysis was finished, we suggested to the customer a predictor for failing lots based on tree ensembles and a 1-SVM (one-class support vector machine) for anomaly detection. The predictor includes expert feedback and updates itself based on new data.
The following steps will include industrialisation and integration of the classifier and anomaly detector into the manufacturing engineering system, as well as using a more sophisticated analysis from earlier production steps for the detection of issues; this should be applied not only during the final test but also much earlier in production. Ultimately, this approach will reduce costs and save resources needed for testing.
4. Performance Optimisation for an IoT System
The increasing connectedness of devices drives a manufacturer of ventilation systems to extend its proprietary bus system for its embedded actuators with Cloud access. The new system suffers from bad control performance and disturbances on the bus, which are neither reproducible nor able to be analysed.
The ERNI solution collects and evaluates (Python) 1 Mio. system values and bus telegrams for 20 systems daily. The data is filtered, reduced and stored in a time-series database (InfluxDB). Grafana is used as a flexible, powerful and user-friendly tool to visualise and correlate data as well as to define KPIs and set up alarms.
ERNI supported our customer in detecting error patterns and analysing suspicious trends. Ultimately, the bus performance could, in certain situations, be increased tenfold, the control algorithms became predictable and stable, and 90% of the known bus disturbances were eliminated.
5. Business-Value Prediction
Our customer connects sellers and buyers of SMEs. In order to optimise its portfolio and marketing investments, it wants to predict the value and popularity of any given company.
ERNI carefully selected significant data from its complex database and cleaned and simplified historic data. The team used a model based on the Microsoft Azure Machine Learning platform to predict time-to-sale and selling price.
The results are visualised and integrated into a mobile app. This lean and easy-to-use app significantly helps the customer evaluate an offer quickly.