Stefan Savev

Berlin

nicht verfügbar bis 31.01.2025

Letztes Update: 30.11.2023

Big Data Engineer (Freelance)

Firma: stefansavev.com

Abschluss: MS, PhD

Stunden-/Tagessatz: anzeigen

Sprachkenntnisse: deutsch (gut) | englisch (verhandlungssicher)

Schlagwörter

Big Data Datenbank Elastic Search Java Scala Go Microservices AWS (Amazon Web Services) GCP (Google Cloud Platform) Python postgres MySQL

Skills

AWS, Google Cloud, GCP, Big Data, MySQL, Redshift, Python, Scala, NodeJS, Spark, ElasticSearch, EC2, EMR, Kubernetes, Docker, Terraform, CloudFormation, Jenkins, CircleCI, Cucumber, MongoDB, Hadoop, Solr, JavaEE, Apache Hive, Search Engine, C++, C#, JavaScript, F#, functional programming, algorithms, Java, Matlab, Go, R, TypeScript, SQL, Postgres, Redis, DynamoDB, Kafka, Flink, Nginx, Swagger, Serverless, Architecture, recommendation systems, machine learning, backend,

Certified AWS Architect, Developer, Big Data, Security
Certified Google Cloud Big Data

Projekthistorie

11/2015 - 04/2020

Big Data Engineer and Backend Engineer

Stream ML UG

Multiple projects in Big Data, Microservices, Cloud Architecture, Machine Learning, Software Development
Go, Scala, Java, Python, NodeJS.

Project 1: near-real time data replication system from MySQL to Redshift in AWS using Kafka, Spark streaming with Scala and Python. The system also used Prometheus, Kibana, Sqoop, Flume and Gitlab. Worked very closely with other data engineers, product owner and data scientists. Replicated very large tables in near real time. Team size 6-7. Duration 6 months.

Project 2: end-to-end data processing architecture and implementation from log structuring, to storage in Redshift and S3 to machine learning with NLTK, Scikit Learn, and Python to serving recommendations and analytics via a NodeJS API from Elasticsearch. Worked very closely with CEO, CTO, Senior Product Managers and DevOps Engineers to ship to production. A near-real time recommendation system was deployed and proven effective in an independent A/B test. Helped interview, hire and on-board data engineers. Project duration 8 months. Team size: 1 core, 3 non-core.

Project 3: prepared training materials on Spark architecture and sketching algorithms commissioned by a top-tier IT company; trained data scientists who are now senior data scientists, managers and CTOs on efficient algorithms for Big Data with Spark (Scala) and PySpark using a hand-on problem solving approach

Project 4: participated in design thinking workshops as a data engineer to inform future data products. In a team of three, delivered clickable proof-of-concept prototype for large financial data in just three days. The system ingested and structured GBs of financial data in Redshift. The aggregated small data moved to MySQL from where a Go backend took over to deliver to a React app. Developed a simple idempotent pipeline in bash. Deployed infrastructure on AWS. Facilitated discussions between data scientists, backend engineers and financial data experts to inform the product.

Project 5: external member of an in-house team to build a modern data pipeline tool. The tool supports data pipelines on AWS Redshift and EMR (Spark). Informed software and cloud architecture in technical discussions with developers, team lead and product owner. Implemented features, unit and integration tests as well as end-user demos. Created and hosted Python and Anacodna packages on Artifactory. Setup test jobs on Jenkins. Supported peers with code reviews. Supported other engineers with debugging help. Duration 4-5 months. Team size: 5 - 8.

Project 6: optimized slow database queries in Postgres. Optimized data storage via Change Data Capture and Slowly Changing Dimension. Implemented SWIFT parser in Go via code-generation. Implemented comprehensive integration tests. Delivered extensive project documentation for project handover. Used Docker, Postgres and Go. Duration 4 months. Team size 4.

Project 7: Large financial institution: extracted and presented visitor analytics from ELB logs in Jupyter on the request of a product owner

Project 8: developed (in collaboration) an open-source repository for analysis of stock market data to showcase the data analytics capabilities of a consulting company. The repository is now featured on AWS Open Data Registry. Used Jupyter, Tensorflow, scikit-learn, and AWS Sagemaker.

Project 9: Evaluated cloud (AWS, Google Azure), third party and desktop OCR services. Implemented a zonal OCR prototype in a team of two using open-source image processing libraries (OpenCV) and cloud services.

Project 10: participated as a machine learning expert in design thinking workshops for building NLP-based and image search products. Facilitated discussion around product and engineering requirements and the application of state-of-the-art approaches. Delivered a 30-page report of related work and included the output models on data similar to the customer's data.

Demos:

I have developed multiple demos to showcase my abilities to prospective customers: https://stefansavev.com/demos.html

11/2013 - 05/2015

Senior Software Developer (Big Data & Search)

Researchgate

Proposed and developed ranking methods for publication recommendations based on citation data
Demonstrated improvements in live systems via A/B testing
Implemented data transformation jobs and pipelines using MongoDB, Hadoop, Solr and JavaEE
Deployed recommendation models to production
Re-structured open source Apache Hive testing framework as a standalone package for company-wide use
Developed Apache Hive user defined types for use by data scientists

03/2012 - 10/2013

Software Development Engineer (Machine Learning for Microsoft Bing Search)

Microsoft

Implemented ranking features in the Bing Search Engine (with C++)
Carried out extensive machine learning experiments for search (with C#)
Implemented query and search result diagnostic functionality in custom tools (with C# and
JavaScript)
Documented a project developed by another group and onboarded new employees
Proposed and prototyped a novel tool for programming typesafe data pipelines with F# and
type providers

Reisebereitschaft

Verfügbar in den Ländern Deutschland

Berlin, 1-2 days on site off Berlin, remote

Sonstige Angaben

https://stefansavev.com

Youtube - Video

My talk in Youtube at Berlin Buzzwords 2015

Big Data Engineer (Freelance)

Stefan Savev

Big Data Engineer (Freelance)

Schlagwörter

Skills

Projekthistorie

Reisebereitschaft

Sonstige Angaben

Youtube - Video

Profil folgen

Profil folgen

Willkommen bei freelancermap!