by Soňa Pochybová

On 20th June 2019, at the ERNI Development Day taking place in Bern, Switzerland, a new Service was introduced by the name AI & Data Science. This indicates that ERNI would like to start enhancing its portfolio in these fields. But what do these terms stand for? In this blog I would like to give a short overview of what does it mean to be a Data Scientist and what areas you need to study, should you choose to become one.

As someone who studied and practiced experimental High-Energy Physics, it is a subject close to my heart, and I am very glad ERNI is stepping into this direction. There is nothing more thrilling than seeing information emerging from an apparent chaos, when data start telling one of the many underlying stories hidden inside.

So let’s start …

Who are Data Scientists and what do they do

The term Data Scientist was coined in 2008 by D.J. Patil and Jeff Hammerbacher, the respective leads of data analytics at LinkedIn and Facebook. The term stands for people who:

Bring structure to unstructured data

To give an example, imagine you have a problem, which requires collecting date information from various text. There is a multitude of formats which a date can take. 21-03-2019, 03/21/2019, 21st March 2019, … all describe the same date. Considering all of these formats in your analysis is highly impractical. A job of a data scientist in this case would be:

  • Identify all possible formats of a date in the data set
  • Define optimal, unified structure, e.g. a mapping: {Day: 21, Month: 3, Year: 2019}
  • Store all date values from the data set in this unified structure

Define which questions need answering

Having data and having a problem you want to solve is one thing. But knowing how the data may help you solve your problem is another. Skilled Data Scientists know how to approach the data in order to solve the problem at hand and prepare the data for the upcoming analysis in a way best suited for the question.

For example, you have a problem, where you want to offer a suitable discount to a customer buying at your store. In order to do so you might want to know the answer to (not only) the following questions:

  • Is the customer male or female?
  • How old is he?
  • Is he married?
  • Does he have kids?
  • How often does he buy the sorts of goods available at your store?
  • What are his hobbies?
  • How do similar customers react on similar incentives?
  • How high needs the discount be?

Analyze data

Once the questions are defined, you can design the process of getting the answers from the data using various analysis techniques. Going to the problem above, you can explore the customer’s purchase history (e.g. coming from his loyalty card) to get answers to many of the questions posed and then decide on an offer based on the portfolio of products at your store. You may want to predict the success of your decision based on the reactions of similar customers and how this in turn increases your sales.

Drive strategic decision making

Once the first results are ready, it is good to visualize them in a way, that the message you want to give is clear and understandable. In our example, what you may want to present is how the sales rise as a result of offering a targeted discount, as compared to offers not tailored to customer behaviour. This can influence the strategy on discount offers in a store with the focus to increase sales volumes.

These activities can be shortly summarized in the Data Science lifecycle:

data-science-lifecycle

Trades of a Data Scientist

Looking at the description above, it is obvious Data Scientists have to be curious and strongly data and result oriented. On top of that, they need to be highly technically skilled and have a strong background in statistics and linear algebra. They need to have great communication and presentation skills. It’s a lot, but no worries! If you’re driven and you see yourself as one day having a career in Data Science, it’s a knowledge you can build up. To give you some guidance, focus on the following:

  • Statistics and linear algebra
  • R, Python (numpy, scikit-learn, pandas, tensorflow)
  • Apache Spark, Dask, Clouds
  • SQL, NoSQL
  • Data visualisation tools (bokeh, matplotlib)
  • Data analytic tools: Tableau, QlikSense
  • Collaborative development tools (GitHub, Jupyter Notebook)

So, start learning and enjoy the journey ….

References

https://datascience.berkeley.edu/about/what-is-data-science/
https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
https://medium.com/towards-artificial-intelligence/the-data-science-methodology-50d60175a06a
https://towardsdatascience.com/the-5-basic-statistics-concepts-data-scientists-need-to-know-2c96740377ae

News from ERNI

In our newsroom, you find all our articles, blogs and series entries in one place.

  • 27.09.2023.
    Newsroom

    Unveiling the power of data: Part III – Navigating challenges and harnessing insights in data-driven projects

    Transforming an idea into a successful machine learning (ML)-based product involves navigating various challenges. In this final part of our series, we delve into two crucial aspects: ensuring 24/7 operation of the product and prioritising user experience (UX).

  • 13.09.2023.
    Newsroom

    Exploring Language Models: An overview of LLMs and their practical implementation

    Generative AI models have recently amazed with unprecedented outputs, such as hyper-realistic images, diverse music, coherent texts, and synthetic videos, sparking excitement. Despite this progress, addressing ethical and societal concerns is crucial for responsible and beneficial utilization, guarding against issues like misinformation and manipulation in this AI-powered creative era.

  • 01.09.2023.
    Newsroom

    Peter Zuber becomes the new Managing Director of ERNI Switzerland

    ERNI is setting an agenda for growth and innovation with the appointment of Peter Zuber as Managing Director of the Swiss business unit. With his previous experience and expertise, he will further expand the positioning of ERNI Switzerland, as a leading consulting firm for software development and digital innovation.

  • data230.08.2023.
    Newsroom

    Unveiling the power of data: Part II – Navigating challenges and harnessing insights in data-driven projects

    The second article from the series on data-driven projects, explores common challenges that arise during their execution. To illustrate these concepts, we will focus on one of ERNI’s latest project called GeoML. This second article focuses on the second part of the GeoML project: Idea2Proof.

  • 16.08.2023.
    Newsroom

    Unveiling the power of data: Part I – Navigating challenges and harnessing insights in data-driven projects

    In this series of articles (three in total), we look at data-driven projects and explore seven common challenges that arise during their execution. To illustrate these concepts, we will focus on one of ERNI’s latest project – GeoML, dealing with the development of a machine learning algorithm capable of assessing road accident risks more accurately than an individual relying solely on their years of personal experience as a road user, despite limited resources and data availability.

     

  • 09.08.2023.
    Newsroom

    Collaborative robots revolutionising the future of work

    The future of work involves collaboration between robots and humans. After many years of integrating technology into work dynamics, the arrival of collaborative robots, or cobots, is a reality, boosting not only safety in the workplace but also productivity and efficiency in companies.

  • 19.07.2023.
    Newsroom

    When the lid doesn’t fit the container: User Experience Design as risk minimisation

    Struggling with a difficult software application is like forcing a lid onto a poorly fitting container. This article explores the significance of user experience (UX) in software development. Discover how prioritising UX improves efficiency and customer satisfaction and reduces risks and costs. Join us as we uncover the key to successful software applications through user-centric design.

  • 21.06.2023.
    Newsroom

    How does application security impact your business?

    With the rise of cyber threats and the growing dependence on technology, businesses must recognize the significance of application security as a fundamental pillar for protecting sensitive information and preserving operational resilience.