• Файл

Андрій

Data engineer

Вік: 25 років
Місто проживання: Львів
Готовий працювати: Дистанційно
Вік:
25 років
Місто проживання:
Львів
Готовий працювати:
Дистанційно

Контактна інформація

Шукач вказав телефон та ел. пошту.

Прізвище, контакти та світлина доступні тільки для зареєстрованих роботодавців. Щоб отримати доступ до особистих даних кандидатів, увійдіть як роботодавець або зареєструйтеся.

Завантажений файл

Файл містить ще 2 сторінки

Версія для швидкого перегляду

Це резюме розміщено у вигляді файлу. Ця версія для швидкого перегляду може бути гіршою за оригінал резюме.

Andrii Stasiuk | Senior Big Data Software Engineer
Lviv, Ukraine | [відкрити контакти](див. вище в блоці «контактна інформація») | [відкрити контакти](див. вище в блоці «контактна інформація»)

Summary
Results-driven Senior Big Data Engineer with 6+ years of experience designing and optimizing
large-scale data pipelines (1TB+), data warehouses, and analytics platforms. Proven expertise in
Apache Spark, PySpark, Airflow, SQL optimization, and cloud infrastructure (AWS & GCP). Strong
background in performance tuning, data architecture. Skilled in leading teams, optimizing complex
ETL processes, and enhancing database/query performance for high-volume systems. Passionate
about building scalable, efficient, and secure data-driven solutions that drive business impact.

Technical skills
●​ Programming languages: Python
●​ Structured Query Language: SQL, HQL
●​ BigData: HDFS, MapReduce, PySpark, Pandas, Hive, Apache Airflow
●​ Databases: PostgreSQL, MSSQL, Redshift, BigQuery, Neo4j
●​ Tools: Docker, Kafka, Kubernetes
●​ Frameworks: Flask, Sanic, FastAPI, SQLAlchemy, Gino
●​ Clouds: AWS (S3, EMR, EKS, Secrets Manager, Redshift, RDS, Glue, CloudWatch), GCP
(BigQuery)
●​ Platforms: Mac OS X, Windows XP/7/10, Linux
●​ Issue Tracking: Jira
●​ Testing tools: PyTest, Unittest, Asynctest
●​ CVS: Git
●​ Methodologies: Agile, SCRUM, Kanban

Experience

Senior Software Engineer at GridDynamics

August 2022 - ongoing project

Internal project at a leading global technology company known for its innovative consumer
products, shaping trends in the tech industry, where the goal is to build a data warehouse for
program management data additionally with a set of associated data pipelines, analytical
capabilities and data apps. The data pipelines read data from multiple sources, transform the
data into a nice, queryable structure, and write it to the data warehouse. On top of this data
warehouse we have built some analytics products, data apps, and web-platform.
Responsibilities:
●​ Design DWH
●​ Build ETL pipelines to ingest and process data on hourly basis
●​ Develop applications for reporting and optimisation processes related to Project / Jira
management.
●​ Requirements negotiation / sprint planning / prepare design of architecture
●​ Proactivity monitoring of job execution and data quality using Splunk, Hubble, and slack
integration
●​ Provision of data security
●​ Debugging, profiling, and optimizing ETL jobs

Achievements:
●​ 200+ users using DWH for data analysis were involved
●​ Developed 15+ analytics applications, enhancing project tracking & decision-making for
200+ stakeholders

Technologies: Python 3.11, PySpark 3.3.0, AWS infrastructure, Jenkins, PostgreSQL, Kubernetes.

May 2025 - September 2025

Internal project at a leading global retail & technology organization, focused on building a
large-scale Product Configuration Management System integrated with multiple enterprise data
sources and modeling engines.​
The goal of the project was to create a flexible, scalable, and data-driven platform for configuring
complex furniture and home products, supporting thousands of product combinations, dependency
rules, pricing logic, visualization, and data governance.

Responsibilities:
●​ Developed modeling engines and Python core modules (TemplateBuilder, AxisBuilder,
TensorProcessor, ModelBuilder) to validate configuration tensors, clean model
dependencies, and generate configuration outputs.
●​ Implemented graph-based dependency resolution using Neo4j to manage product option
rules, invalid combinations, availability constraints, and dynamic configuration logic.
●​ Designed and managed artifact storage workflows (S3/MinIO), including versioning,
validation, compression, and lifecycle management.
●​ Ensured data quality and system reliability with extensive Pytest suites (integration tests with
Oracle, Neo4j, MinIO; mocks; synthetic data; coverage metrics).
●​ Conducted performance optimization, profiling, removal of bottlenecks, and refactoring.

Achievements:
●​ Delivered a scalable product configuration engine capable of supporting thousands of
product combinations and complex rule chains across multiple brands.
●​ Reduced model processing time and improved system reliability through algorithmic
optimizations (dependency pruning, graph indexing, template diffing).
●​ Built a consistent mathematical framework for incomplete configurations, rule
evaluation, and parameter dependency logic.

Technologies: Python 3.12, Oracle DB, Neo4j, MinIO/S3, Docker, Kubernetes, Jenkins, Poetry,
Pandas, Pytest.

February 2024 - December 2024

Internal project at one of North America's largest retail and pharmacy chains, leveraging
advanced big data solutions to enhance business operations and decision-making. The goal
was to build and optimize a large-scale data warehouse for analytical and AI-driven insights.
This involved designing and maintaining robust data pipelines that ingest, transform, and store
data from multiple distributed data warehouses and data lakes (BigQuery instances) across
different intervals. The platform supported over 1TB of data, requiring efficient SQL query
optimization and data structure tuning. Additionally, a web-based analytics platform was
developed with integrated data science models to provide sales teams with actionable
recommendations and business insights.

Responsibilities:
●​ Develop and maintain scalable ETL pipelines using Apache Airflow and PySpark
●​ Optimize data processing and storage strategies for high-volume datasets (+1TB)
●​ Implement data ingestion workflows from multiple BigQuery instances and Google Cloud
Storage
●​ Design and maintain a well-structured, query-efficient data warehouse
●​ Improve SQL performance and optimize data models for analytical use cases
●​ Collaborate with data scientists to integrate ML-driven recommendations into a web platform
●​ Ensure data quality, consistency, and security across all data pipelines

Technologies: Python 3.11, PySpark, Apache Airflow, Google BigQuery, Google Cloud (Composer,
GCS, etc.)

November 2021 - July 2022

A large enterprise project aims to propose various approaches for effective communication
with clients, increasing the likelihood of successful product sales by the product owner. All
recommendations are generated by machine learning algorithms.

Responsibilities:
●​ Prepare required views on Redshift for the DS team based on data that the client has
in the DWH
●​ Verify data quality with the help of the Deequ (Schedule Glue script on a daily basis to
analyze results every day)
●​ Estimating features
●​ Performing code reviews
●​ Team Lead for BigData engineers, need to define sprint/backlog scope, discuss all problems
with clients and provide output to the team

Achievements:
●​ Extrapolated solution to more than 5 markets;
●​ Planned and performed refactoring with significant performance increase.

Technologies: Python 3.9, PySpark 3.1.2, AWS (S3, Glue, Redshift),macOS.

May 2021 - November 2021

Big Data project where the main goal was to implement and deploy a scalable system that is
going to aggregate telemetry and fuel consumption data, analyze it on a daily basis and
produce the report that can be consumed by the analytical team.

Responsibilities:
●​ Design architecture for the project, prepare DAP on AWS
●​ Building and executing Spark Jobs
●​ Constructing Spark DataFrames and using them to write ad-hoc analytical jobs
●​ Debugging, profiling, and optimizing Spark application performance
●​ Creating DAGs (Airflow) in Python language
●​ Estimating features
●​ Performing code reviews

Technologies: Python 3.8, PySpark 3.1.2, Airflow 2.1.0, AWS (S3, EMR, EKS, RDS (MS
SQL), Secrets Manager), macOS.

Software Engineer at SoftServe

September 2020 - May 2021

A large enterprise project with microservice architecture intended to govern, create, maintain, use,
and analyze consistent, complete, contextual, and accurate master data information for all
stakeholders, such as line of business systems, data warehouses, and trading partners. It provides
a customizable framework of components that control the lifecycle management of master data,
quality and integrity of the data, and stateless services to control the consumption and distribution
of data.

Responsibilities:
●​ Implement new functionality
●​ Database management
●​ Cover codebase by unit and integration tests
●​ Bugs troubleshooting
●​ Prepare design for new services
●​ Estimating features
●​ Performing code reviews
Achievements:
●​ Improved performance of SQL queries execution
●​ Implemented a new service to enrich information about entities that were added to the
production
●​ Implemented a script to transfer specific data from one environment to another to avoid
errors that can occur due to constraints or triggers on DB. This allows us to quickly start
working with the system in a new environment, with the data that was set up in the old
environment

Technologies: Python (3.6, 3.7), Flask, Sanic, PostgreSQL, Restful API, SQLAlchemy,
Gino, Docker, Marshmallow, Linux, Git, Kafka, Swagger

November 2019 - August 2020

A large enterprise project with microservice architecture intended to optimize calculating the
rebates, chargeback and other types of fees that are used in the USA marketing. The complex
platform for business data management and analysis aims to assist customers in the delivery of
managed services for the healthcare industry.

Responsibilities:
●​ Implement new functionality
●​ Cover codebase by unit and integration tests
●​ Bugs troubleshooting
●​ Performing code reviews

Achievements:
●​ Refactored functionality to use butch load to avoid memory error
●​ Held meetings to improve code quality each week

Technologies: Python (3.4, 3.6, 3.7), Flask, Sanic, PostgreSQL, Restful API, Apache Spark,
Apache Hadoop, Docker, Marshmallow, Linux, Git, Microservice Architecture

May 2019 - October 2019

News service, where users can read/create/share the different articles about all sports events in the
world. Registered users can read and post messages, but unregistered users can only read them.
Users access service through the website interface.

Responsibilities:
●​ Implement new functionality
●​ Performing code reviews
●​ Design system

Technologies: Python 3.7, Django, PostgreSQL, Linux, Git, React, SQLAlchemy

Education and Certificates
●​ 2023 - ongoing, Master’s degree, Institute of Economics and Management, Lviv Polytechnic
National University
●​ 2017- 2021, Bachelor’s degree, Department of Artificial Intelligence Systems, Lviv
Polytechnic National University

Languages
●​ English - B2
●​ Ukrainian - Native Speaker

Схожі кандидати

Усі схожі кандидати

Кандидати у категорії


Порівняйте свої вимоги та зарплату з вакансіями інших підприємств: