Our client is a Governmental organisation in the Local Government sector and they seek the services
of the Data Engineer to be based at their national office in Pretoria.
Job Purpose:
To design, develop, implement and maintain scalable and robust data integration interfaces and data models required by Analysts, Data Product Owners and Data Scientists. A formidable Data Engineer will demonstrate unsatiated curiosity and outstanding interpersonal
skills. Also responsible for employing machine learning techniques to create and sustain structures that allow for the analysis of data, while remaining familiar with dominant programming and deployment strategies in the field. Drive automation in data integration and management, as well as managing and building data pipelines to provide a foundation for all analytics programmes.
Responsibilities:
- Drive automation in data integration and management:
- Track data consumption patterns, preparation and integration tasks to identify the most common, repeatable tasks
- Prioritize opportunities for automation to minimize manual and error-prone processes
- Improve productivity across the data-science/analytics team
- Promote reuse of content and data through a centralized portal, catalog or other system
- May be required to monitor schema changes, perform intelligent sampling and caching, and learn and use AI-enabled metadata management techniques
- Build, manage and maintain data pipelines and architecture to provide a foundation for all
analytics projects: - Match appropriate data sources to identified analytics use cases
- Integrate data from multiple sources into a single source or system and contribute to educational programs to increase accessibility for analytic team and users
- Ensure data pipelines comply with applicable regulations and organisational governance standards
- Maintain data pipelines by fixing technical issues related to team access
- Optimize data quality by flagging sources for review, filling gaps, developing proxy variables, etc., where appropriate and clarifying limitations with data science teams
- Communicate cross-functionally and with data science teams and marketing analysts:
o Collaborate with IT and other departments to gain access to enterprisewide data systems
o Work with IT leaders, platform specialists and others to identify unknown sources of data
or reconcile important differences in datasets
o Work closely with data scientists and marketing analysts to define data requirements for
analytics projects
o Continuously support efforts to upskill and educate data scientists, marketing analysts and
others within the marketing organization - Participate in efforts to improve data governance and ensure compliance:
o Recommend methods to continuously optimise data collection processes as well as tagging
and the use of analytics tools
o Coordinate across regions/provinces/municipalities, where applicable, ensuring
standardised data models and analytical methodologies - Assemble large, complex data sets that meet functional / non-functional business requirements.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using Python, GCP, Azure ‘Big Data’ technologies.
- Build analytics tools that utilize the data pipeline to provide actionable insights into customer
acquisition, operational efficiency and other key business performance metrics.
Work with stakeholders including the Executive, Product, Data and Design teams to assist with
data-related technical issues and support their data infrastructure needs.
Keep our data separated and secure across national boundaries through multiple data centres
and GCP, Azure regions.
Create data tools for analytics and data scientist team members that assist them in building
and optimizing our product into an innovative industry leader.
Work with data and analytics experts to strive for greater functionality in our data systems.
Continuously learn and apply latest and fit-for-purpose, open-source and proprietary tools and
technologies to achieve results, including some or all of the following: - Cloud
- Microsoft Azure (must)
- AWS
- Google Cloud
- Database and Data Management
- Microsoft SQL
- Server
- MySQL
- PostgreSQL
- Mongodb
- Languages
- Python (must)
- R (must)
- SQL (must)
- Java
- C/C++
Outputs
1. Data modelling
Relational data modelling in traditional relational database management systems.
(Microsoft SQL Server, MySQL, PostgreSQL, Python, etc)
• Data Modeling of Transaction and Master Data.
• Data extraction from on-premise, hosted systems and repositories, and
other 3rd party platforms using various ETL tools.
2. Coercing unstructured and semi-structured data into a structured form.
• Testing such structures to ensure that they are fit for use.
• Preparing raw data for manipulation by Data Scientists.
• Detecting and correcting errors in your work.
• Combine raw information from different sources
3. Data pipelining knowledge data extraction and transformation.
• Design, develop and test large stream data pipelines to ingest, aggregate, clean, and distribute data models ready for analysis.
• A deep understanding of data pipelining, streaming, and Big Data technologies, methods, patterns, and techniques.
• Testing large stream data pipelines to ingest, aggregate, clean, and distribute data models ready for analysis
4. Data transformation knowledge for reporting and analytics purposes.
• Leveraging best practices in continuous integration and delivery
Requisite knowledge
1. ETL
2. Advanced working Python knowledge and experience working with relational databases, query authoring Python as well as working familiarity with a variety of databases.
3. Experience building and optimizing data pipelines, architectures and data sets.
4. Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
5. Strong analytic skills related to working with unstructured datasets.
6. Build processes supporting data transformation, data structures, metadata, dependency and workload management.
7. A successful history of manipulating, processing and extracting value from large, disconnected datasets.
8. Strong project management and organizational skills.
9. Experience supporting and working with cross-functional teams in a dynamic environment
Skills and Abilities required to do the job
1. Build processes supporting data transformation, data structures, metadata, dependency and workload management.
2. A successful history of manipulating, processing and extracting value from large, disconnected datasets.
3. Strong project management and organizational skills.
4. Experience supporting and working with cross-functional teams in a dynamic environment.
5. General data management and analytics skills, including: - Present results in consumable formats, including dashboards, reports and data visualizations
- Conduct quantitative research/analysis, generate insights, data modeling, data engineering, often through specific languages and/or statistical methodologies(i.e., Python, SQL)
- Edit databases, mining data and performing queries (i.e., Hadoop, Hive, SQL, Teradata) and working with data warehouses and other relational database storage environments
- Apply advanced analytics and data science concepts and/or methodologies such as predictive analytics, data modeling, forecasting and machine learning where appropriate
6. Database management and/or platform knowledge, including:
- Operational [data] automation systems (i.e., ERP, OT and other data source platforms)
- Common spreadsheet applications (e.g., Excel), including pivot tables and charting
- Common BI tools and/or data analytics platforms (i.e., Excel, PowerBI, Google Analytics)
7. Ability to execute against use cases for data science, including:
- Digital channel and platforms optimization and measurement
- Support analysis of measurement of cross-functional (internal and municipal) areas to support strategic decision making (KPIs, Metrics, etc)
- 8. Cleaning and integrating multiple data sources to support analytics
Qualifications and experience
1. Bachelor’s or Honour’s degree in Computer Science or Engineering or equivalent experience
2. 4+ years in software development experience and data management disciplines including data integration, modelling, optimization and data quality and/or other areas directly relevant to data engineering responsibilities and tasks
3. Project Management (classic and agile) experience
4. Proficient in Java, C/C++, or Python
Remuneration
R 629k – R820k p/a total cost
How to apply
Please send your CV to Colin Khomeliwa, cv@khomeliwa.com on or before 16:00 hours on Monday 28
February, 2022. The job title must appear in the subject line of the e-mail