Profilbild von Anonymes Profil, IT-Freelancer
verfügbar

Letztes Update: 05.07.2023

IT-Freelancer

Abschluss: Promovierter Diplom-Mathematiker
Stunden-/Tagessatz: anzeigen
Sprachkenntnisse: deutsch (Muttersprache) | englisch (verhandlungssicher) | französisch (Grundkenntnisse)

Skills

Branchenerfahrung: Online-Advertisement, Automotive, Soziale Netzwerke, Retail, Finance (FX/MM) 

Fachlicher Schwerpunkt: Entwicklung und Umsetzung von Software-Lösungen im Bereich "Data"

Fachwissen: 
  • Data Science/Data Mining
  • Machine Learning/Deep Learning/Natural Language Processing 
  • Software Design Patterns 
  • Programmierparadigmen:
    • Objektorientierte Programmierung 
    • Funktionale Programmierung 
    • Imperative Programmierung 
  • Statistik & Stochastik/Numerik 
  • Webtechnologien
Programmiersprachen (inkl. Skriptsprachen/DSL/Datenbankabfragesprachen)
  • Scala
  • Python
  • Kotlin
  • SQL
  • Bash
Datenbanken
  • MS SQL Server 
  • MySQL 
  • MongoDB 
Big Data-Technologien
  • Spark
  • Hadoop
  • Kafka
  • Cassandra
Tools:
  • Git (sowie Github/Bitbucket)
  • Confluence/Jira/Trello
  • Docker
Praktische AWS-Kenntnisse:
  • EC2/EMR 
  • Kinesis 
  • Lambda 
  • S3 
Frameworks:
  • Spark MLlib 
  • scikit-learn 
  • NumPy 
  • Pandas 
  • Keras 
  • PyTest 
  • Flask 
  • JUnit 
  • ScalaTest/ScalaMock 
Andere:
  • Apache Airflow
  • Apache Zeppelin
Zertifizierungen:
  • Confluent Developer Training: Building Kafka Solutions
  • EdX BerkeleyX – CS190.1x: Scalable Machine Learning
  • EdX Introduction to Big Data with Apache Spark
  • Coursera Functional Programming Principles in Scala
  • Coursera R Programming
  • Coursera Exploratory Data Analysis
  • Coursera Developing Data Products
  • Coursera Statistical Inference
  • Coursera Pattern Discovery in Data Mining
  • Coursera Cluster Analysis in Data Mining
  • Coursera Text Mining and Analytics
  • Coursera Applied AI with Deep Learning
  • Coursera Getting Started with Google Kubernetes Engine
  • Coursera Kotlin for Java Developers
Trainings
  • AWS AI Bootcamp, München, 07/2017 (1 Tag)
  • Confluent Developer Training: Building Kafka Solutions, Dallas (TX), USA, 07/2017 (3 Tage)
  • Mesosphere DC/OS Crash Course (Container Days 2016), 06/2016, Container Days 2016
  • Spark-Training by lightbend (Typesafe), 11/2015 (2 Tage)
  • Advanced Training by databricks: Exploring Wikipedia with Spark (Spark Summit 2014), Spark Summit 2014, (1 Tag)
  • Flink hands-on Training (Flink Forward 2014) (Halber Tag)

Projekthistorie

06/2022 - 12/2022
Clustering of POI data
Leading German multinational automotive manufacturer (Automobil und Fahrzeugbau, >10.000 Mitarbeiter)

As a data engineer, I developed the replacement of existing clustering batch jobs for a software company belonging to a leading German multinational automotive manufacturer. This clustering uses location and additional metadata from worldwide distributed electric charging stations.

As part of the company's team responsible for POI data management (POI=point of interest), I collaborated closely with the developers who previously worked on this task and utilized my expertise with Spark and Airflow to design and implement the new solution using Databricks on Azure.
 
  • In collaboration with another team member, I worked on refactoring the existing Airflow pipeline for the batch jobs and the grown and complex business logic. We also refactored the library code used for the clustering and preprocessing steps. While doing so, I added missing documentation and improved code quality by using techniques from defensive programming.
  • Based on the acquired domain knowledge and business requirements for the clustering, I built a new solution in Azure Cloud using Databricks. Tasks involved integrating existing data in AWS S3 and Azure blob storage, development of library code, and Spark job for geospatial clustering of charging station data.
  • While the team mainly works with Python, the new solution also uses the official open-source Scala library for graph clustering. For this, I used my experience with Scala and JVM-based development to make the library callable from Python using a suitable wrapper class.
  • At the beginning of the project, I also worked with the developers and testing team to eliminate bugs and increase the test coverage for an event-driven service. This Azure Functions-based service detects and removes certain privacy-related information within streams of vehicle signals. Redis is used to cache intermediary results detected in the event streams.
  • Other tasks: Contribution to code reviews, PI planning, testing, and documentation.

Technologies: Databricks on Azure, Azure Blob Storage, AWS S3, Python/Scala, PySpark/Spark, GraphX, Pandas, Numpy, Jenkins, Azure Functions, Redis, Poetry/SBT, Airflow, Git, Bitbucket

05/2021 - 03/2022
Data and scores delivery for CRM
Leading German media company (Medien und Verlage, >10.000 Mitarbeiter)

Data Engineer in the central team responsible for data and scores used by automatised customer relationship management.
  • Productionizing of machine learning models in the cloud for churn scoring, next best actions and the prediction of customer behavior.
  • Collaboration with Data Scientists for the development of these models, and another team responsible for recommendations, search and APIs on the streaming portal.
  • Building and planning of ETL pipelines for contract and usage data as well as for the computation of features consumed by machine learning models. 
  • Automation of data exports and sinking scores consumed by other services and tools into the central event bus.

Technologies: Python, PySpark, AWS (S3, Kinesis, Athena, EMR, Glue), SQL, Pandas, Kubernetes, Terraform, Git, GitLab

12/2020 - 12/2020
Extension of the central Big Data Lake for the Reporting of Real Time Campaign Management
Multinational telecommunications company (Telekommunikation, >10.000 Mitarbeiter)

Supported the client's data engineering team with the conceptual and architectural preparation of the extraction of data from an external API to their AWS-based Big Data Lake and the Redshift data warehouse.
The API belongs to a SaaS platform used for campaign management and customer analytics, involving topics from Natural Language Processing (NLP) including for example sentiment analysis and phrase as well as keyword detection within customer text comments. 
Activities included:
  • Gathering requirements from project stakeholders and requirement analysis.
  • Clarification of issues related to Natural Language Processing and the API design with a contact person from the SaaS platform and stakeholders.
  • Extension of documentation in Confluence.
  • Preparation of logical data model and conceptual design of the ETL processing pipeline. 
  • Preparation of proof of concept for streaming data from Kafka cluster to S3 layer in AWS cloud using Databricks Delta Lake framework.

Technologies: Scala, Spark, AWS, Kafka, Delta Lake, SQL, Natural Language Processing, Redshift

08/2019 - 01/2020
Development of an enterprise-wide Cloud Data Lake with Analytical Interface (Ct'd)
European fashion and lifestyle company (Konsumgüter und Handel, 1000-5000 Mitarbeiter)

Responsible as a software developer for the development of an AWS-based cloud data lake divided into several zones, which is also used for a downstream data mart in Redshift used by end-users for analytics and queries via Tableau. 
Activities included:
  • Used Apache Nifi during the initial proof of concept phase for the extraction from source systems to S3 and Kinesis. 
  • Developed several software services with Scala and Spark for automatic extraction and transformation (ETL) of different source systems/databases to and within the AWS cloud, respectively. 
  • Implemented transformation logic for the creation of fact and dimension tables (data modeling following Kimball).
  • Implementation of ETL pipelines using Spark and Scala, based on specifications and blueprints from the data analytics department and already existing Tableau Prep Flows.
  • Development, adjustment, and deployment of programmatic workflows for scheduling Spark jobs on EMR clusters via Apache Airflow.
  • Transfer of knowledge and mentoring regarding Scala and Spark to coworkers during team programming. 
  • Development of a custom mini-framework using Scala for working in a type-safe fashion with Spark data frames (worked together via team programming for this). The objective of this framework is to facilitate the development and testing of transformation components used in Spark jobs within ETL pipelines.
  • Integration and replacement of new and old data sources, respectively.
  • Development of integration and unit tests, debugging, and execution tests.

Technologies: Scala, Spark, Python, SQL, AWS (EMR, S3, SSM, Kinesis, Redshift), Apache Nifi, Apache Airflow, Docker, Kubernetes, MS SQL Server, SAP HANA, ScalaTest/ScalaMock, Jenkins, JFrog Artifactory, Git, GitHub

01/2019 - 05/2019
Development of an enterprise-wide Cloud Data Lake with Analytical Interface
European fashion and lifestyle company (Konsumgüter und Handel, 1000-5000 Mitarbeiter)

Responsible as a software developer for the development of an AWS-based cloud data lake divided into several zones, which is also used for a downstream data mart in Redshift used by end-users for analytics and queries via Tableau. 
Activities included:
  • Used Apache Nifi during the initial proof of concept phase for the extraction from source systems to S3 and Kinesis. 
  • Developed several software services with Scala and Spark for automatic extraction and transformation (ETL) of different source systems/databases to and within the AWS cloud, respectively. 
  • Implemented transformation logic for the creation of fact and dimension tables (data modeling following Kimball).
  • Implementation of ETL pipelines using Spark and Scala, based on specifications and blueprints from the data analytics department and already existing Tableau Prep Flows.
  • Development, adjustment, and deployment of programmatic workflows for scheduling Spark jobs on EMR clusters via Apache Airflow.
  • Transfer of knowledge and mentoring regarding Scala and Spark to coworkers during team programming. 
  • Development of a custom mini-framework using Scala for working in a type-safe fashion with Spark data frames (worked together via team programming for this). The objective of this framework is to facilitate the development and testing of transformation components used in Spark jobs within ETL pipelines.
  • Integration and replacement of new and old data sources, respectively.
  • Development of integration and unit tests, debugging, and execution tests.

Technologies: Scala, Spark, Python, SQL, AWS (EMR, S3, SSM, Kinesis, Redshift), Apache Nifi, Apache Airflow, Docker, Kubernetes, MS SQL Server, SAP HANA, ScalaTest/ScalaMock, Jenkins, JFrog Artifactory, Git, GitHub

Zertifikate

AWS Certified Cloud Practitioner
2023

Reisebereitschaft

Verfügbar in den Ländern Deutschland
Bevorzugt Hamburg, Hannover, Nürnberg

Sonstige Angaben

-
Profilbild von Anonymes Profil, IT-Freelancer IT-Freelancer
Registrieren