Senior Data Engineer - (Hadoop - Scala, Spark, Python)

Brüssel  ‐ Vor Ort
Dieses Projekt ist archiviert und leider nicht (mehr) aktiv.
Sie finden vakante Projekte hier in unserer Projektbörse.

Beschreibung

Required skills:

  • Experience with analysis and creation of data pipelines, data architecture, ETL/ELT development and with processing structured and unstructured data
  • Proven experience with using data stored in RDBMSs and experience or good understanding of NoSQL databases
  • Ability to write performant Scala code and SQL statements
  • Ability to design with focus on solutions that are fit for purpose whilst keeping options open for future needs
  • Ability to analyze data, identify issues (eg gaps, inconsistencies) and troubleshoot these
  • Have a true agile mindset, capable and willing to take on tasks outside of her/his core competencies to help the team
  • Experience in working with customers to identify and clarify requirements
  • Strong verbal and written communication skills, good customer relationship skills
  • Strong interest in the financial industry and related data.

Will be considered as assets:

  • Knowledge of Python and Spark
  • Understanding of the Hadoop ecosystem including Hadoop file formats like Parquet and ORC
  • Experience with open source technologies used in Data Analytics like Spark, Pig, Hive, HBase, Kafka,
  • Ability to write MapReduce & Spark jobs
  • Knowledge of Cloudera
  • Knowledge of IBM Mainframe
  • Knowledge of AGILE development methods such as SCRUM is clearly an asset.

Job description:

  • Identify the most appropriate data sources to use for a given purpose and understand their structures and contents, in collaboration with subject matter experts.
  • Extract structured and unstructured data from the source systems (relational databases, data warehouses, document repositories, file systems, ), prepare such data (cleanse, re-structure, aggregate, ) and load them onto Hadoop.
  • Actively support the reporting teams in the data exploration and data preparation phases.
  • Implement data quality controls and where data quality issues are detected, liaise with the data supplier for joint root cause analysis
  • Be able to autonomously design data pipelines, develop them and prepare the launch activities
  • Properly document your code, share and transfer your knowledge with the rest of the team to ensure a smooth transition into maintenance and support of production applications
  • Liaise with IT infrastructure teams to address infrastructure issues and to ensure that the components and software used on the platform are all consistent
Start
01/10/2020
Dauer
12 months
Von
Base 3
Eingestellt
08.08.2020
Projekt-ID:
1954878
Vertragsart
Freiberuflich
Um sich auf dieses Projekt zu bewerben müssen Sie sich einloggen.
Registrieren