Profilbild von Marcel Steger Data Scientist | Machine Learning Engineer | Data Engineer | GCP | AWS | Computer Vision aus Wien

Marcel Steger

verfügbar

Letztes Update: 15.03.2024

Data Scientist | Machine Learning Engineer | Data Engineer | GCP | AWS | Computer Vision

Abschluss: M.Sc. Data Science
Stunden-/Tagessatz: anzeigen
Sprachkenntnisse: deutsch (Muttersprache) | englisch (verhandlungssicher)

Schlagwörter

Data Science Information Engineering Python Robotics Deep Learning Matplotlib Amazon Rds Java Javascript APIs + 50 weitere Schlagwörter anzeigen

Dateianlagen

Marcel-Steger-CV-2023_011223.pdf

Skills

Previous Project Roles:
Data Scientist, Data Engineer, Machine Learning Engineer

Programming Languages:
Python, JavaScript, C#

Excerpt of regularly used Python Packages:
pandas, numpy, pytest, attrs, tensorflow, keras, pytorch, opencv, pyodbc, sklearn, matplotlib, seaborn, flask, scipy, huggingface, nltk, multiprocessing, dask, pyspark, trimesh, pyzbar

Data Science Skills:
  • Deep Learning (NLP w/ BERT models, CNNs: SqueezeNet, yolo, GANs, ResNet, NST, AutoEncoders)
  • Non-DL Machine Learning - random forests, logistic/linear regression, clustering algorithms
  • Data Visualization - Google Data Studio, matplotlib, seaborn, ggplot
  • Feature Extraction - PCA, Feature Selection methods (Entropy based, model based, correlation based)
  • Data Engineering: BigQuery, AWS RDS, Postgres, MSSQL, OpenVPN

Target platforms:
  • Amazon Web Services (RDS, EC2, VPC, SageMaker)
  • Google Cloud Platform (Compute Engine, Cloud Scheduler, Google API, Triggering, Google Looker Studio)
  • Raspberry Pi

Code Versioning, CI/CD and other Engineering Tools:
Docker, git, bitbucket, github, gitlab, Confluence, Jira, Slack

Bio:
I am a Data Scientist with a strong background in software engineering. My applied experience ranges from (ML-based and classical) Computer Vision in Robotics, NLP in the entertainment industry all the way to Data Engineering in the Cloud. Additionally, I have analysed and improved data related workflows in the following industries: automotive, medical, robotics, music, insurance, sports, sales.
 

Formal Education:
My educational background with an M.Sc. in Data Science, concluded with distinction gives me a strong mathematical and theoretical foundation towards Data Science and AI methods. 

Certifications:
  • Deep Learning Specialization by DeepLearning.AI
  • Natural Language Processing with Classification and Vector Spaces by DeepLearning.AI

Projekthistorie

04/2023 - 11/2023
Data Lead
(Pharma und Medizintechnik, 50-250 Mitarbeiter)

Tech stack of the project:
Python, AWS RDS, AWS MWAA, Airflow, PostgreSQL, dbt, GitHub, Plotly & Dash

Accomplishments:
• Ownership of ETL Workflows with Airflow MWAA
• Application Code Development for Data Dashboard
• Modularization of existing Codebase
• Deadline-critical deliveries to clients

04/2022 - 09/2022
Senior GCP Data Engineer
Machine Learning Reply GmbH (Internet und Informationstechnologie, 10-50 Mitarbeiter)

Accomplishments:
* created and maintained a cross-cloud data pipeline
* developed, delivered, and deployed solution in a docker image
* created a new Google Looker Studio community visualization tool in js: Sankey Diagram

Main Technologies of the project:
GCP Compute Engine, Docker, BigQuery, Google Looker Studio, AWS VPC, AWS RDS, OpenVPN, Postgres, JavaScript

Project Description:
The scope of the project was Google Looker Studio with a BigQuery data source. The requirement was to create a flow diagram for displaying app usage, for which a Sankey Diagram appeared to be the most suitable solution. However, due to the limitation of existing Sankey Diagram implementations in Looker Studios Community Visualizations, (notably there was no coloring of the edges and recursive connections were prohibited), it was decided to develop a customer-tailored solution in the form of a Looker Studio Community Visualization.

Another requirement was integrating a database from an AWS RDB Instance into BigQuery, located within a private VPN and behind an OpenVPN gate. To allow for synchronization of the data in BigQuery w.r.t this private AWS database, a solution was developed, which first connects via OpenVPN to the private AWS DB instance, upon connection queries the latest data, then connects to BQ, and finally appends it to the respective BigQuery tables. The Docker image was deployed as a Compute Engine instance and triggered via Cloud Scheduler, allowing for easy up-scaling in case of increasing memory requirements.


11/2021 - 03/2022
NLP Data Scientist
ITSP GmbH (Internet und Informationstechnologie, 50-250 Mitarbeiter)

Accomplishments:
* text preprocessing
* Feature Engineering from the processed text (lemmatization, stemming, counts, template matches, meta-features)
* Trained Random Forest for classification of simple vs complex chats
* Used BERT model for sentiment classification of customers/agents, aggregation over time and visualization of sentiment

Main Technologies of the project:
BERT, Python, Pandas, GitHub, Huggingface, random forest

Project Description:

The requirement of the project was to calculate the sentiment of complex chat interactions between customers and service agents. 
First, a classifier was trained, that distinguishes between simple chat interactions of customers and agents (e.g.: "I cannot login") and more complex ones (e.g.:  "The laws in my country have recently changed, which is why I want to revisit the terms of my membership, see Paragraph 1 Section 2a...").
Once that was in place, the sentiment of complex chat interactions was calculated for both users and agents using state-of-the-art Transformer language models. Subsequently, the sentiment scores were aggregated, using a smooth moving average, and the results were visualized. Finally, the visualization was integrated into a dashboard to be used by the customer support managers for monitoring if chats get too negative and enabling them to intervene.


05/2020 - 10/2021
R&D Computer Vision Engineer
Dental Manufacturing Unit GmbH (Industrie und Maschinenbau, 10-50 Mitarbeiter)

Accomplishments:
* brought stable computer vision solution from r&d to market
* implemented robust, efficient, and reliable algorithm for finding the best affine transformation matrix
* implemented a highly reliable and fast image segmentation algorithm
* determined optimal lighting conditions for image capture
* developed extensive test suite for Quality Assurance
* developed and extended CI/CD pipeline
* object-oriented programming throughout the project
* Post-Launch On-site early customer support w.r.t. vision system as well as network communication

Main Technologies of the project:
OpenCV, Python, Flask, Linux, RaspberryPi, CI/CD, PyTorch, scipy
 

Project Description:
Ranging from feasibility study to PoC, to the pre-production of the final machine, the project encompassed multiple aspects of designing the computer vision system for a robotics machine. 
Given an image capture of a 3D object and its 3D mesh data, an algorithm was developed, that segments the object from the background and calculates the affine rotation matrix that needs to be applied in order for the object in the real-world space to match the orientation of the object in the 3D mesh data. The requirements of this goal were

  1. camera calibration,
  2. optimizing camera settings for image acquisition,
  3. finding optimal lighting strategies,
  4. finding optimal machine design for image quality,
  5. 3D to 2D projection, world-to-image conversions (px to mm),
  6. a highly efficient and robust image segmentation algorithm (running on raspberry pi <1 second), as well as
  7. a robust, reliable, and fast algorithm for solving for the optimal affine transformation matrix.

Additionally, to insure the stability of the vision system, extensive regression and unit tests were developed. 


09/2019 - 10/2021
Deep Learning Computer Vision Engineer
Dental Manufacturing Unit GmbH (Industrie und Maschinenbau, 10-50 Mitarbeiter)

Accomplishments:
* successful implementation of CNN architecture with low latency and high accuracy
* deployment of TensorFlow model through ONNX in C# codebase
* wrote unit, regression, and e2e tests

Main Technologies of the project:
Python, TensorFlow, OpenCV, ONNX, CUDA, .NET, C#
 

Project Description:
The goal of the project was to replace an existing non-ml-based computer vision solution for an image classification problem, which has been shown to be susceptible to changes in lighting and other external factors. Because the result of this algorithm was directly used in machine logic and could lead to severe damage to the machine, the goal was 100% classification accuracy.

The requirements of the project were determining the optimal conditions for image acquisition (camera position, exposure, and gain). Subsequently, the collection of data was undertaken over the course of months and over multiple machines in action. Next, a custom CNN architecture was chosen and trained on GPU on the images using image augmentation techniques, and regression tests were set in place. The resulting model was then converted to an ONNX format and deployed within in a C# application running directly on the machine.


12/2018 - 08/2019
Data Scientist
Porsche Informatik (Automobil und Fahrzeugbau, 250-500 Mitarbeiter)

Accomplishments:
* successful implementation of python module for big data feature selection in pyspark
* documentation of implemented feature as well as user guide
* thorough research and understanding of feature selection methods
* data restructuring, processing and preparation, exploratory data analysis
* Code Parallelization in R


Main Technologies of the project:
R, Python, PySpark, Bash
 

Project Description:
The goal of the project was to research solutions for (semi) automated feature selection methods for Big Data use cases with highly distributed data in the automotive space. Given multiple company use cases (regression and classification), initial research was conducted towards state-of-the-art feature selection approaches, and statistical simulation studies have been carried out to assess the strengths and weaknesses of these approaches. The data sets consisted of both discrete and continuous features with highly correlated features as well as interactions in high-dimensional feature space. Both, classical filtering methods as well as stepwise methods based on machine learning approaches - such as regression and random forests - were among the researched methods. The most promising methods have then been implemented as Python modules in the context of the PySpark framework, following standard API design known from other scientific python packages such as sklearn.


07/2018 - 11/2018
Data Scientist
AB Mikroelektronik (Industrie und Maschinenbau, 50-250 Mitarbeiter)

Accomplishments:
* Visualization of KPI’s
* Performance improvements to existing Shiny app
* Creation of automated reports of production data

Main Technologies of the project:
R, R Shiny, SQL

Project Description:
Visualization of production data and creation of automatic reports and visual dashboard using Shiny, plotly, knitR and SQL


Zertifikate

Natural Language Processing with Classification and Vector Spaces
2022
Deep Learning Specialization
2018

Reisebereitschaft

Nur Remote verfügbar
Profilbild von Marcel Steger Data Scientist | Machine Learning Engineer | Data Engineer | GCP | AWS | Computer Vision aus Wien Data Scientist | Machine Learning Engineer | Data Engineer | GCP | AWS | Computer Vision
Registrieren