17 Apr 2025, Thu

When I started as a data engineer nearly a decade ago, my days were dominated by building and maintaining extraction, transformation, and loading (ETL) pipelines. I’d meticulously craft processes to pull data from source systems, transform it through complex logic, and load it into data warehouses. These pipelines were our bread and butter – the fundamental building blocks of any data platform.

Fast forward to today, and something remarkable has happened: I’m building significantly fewer pipelines than before. And I’m not alone.

Across the industry, modern data teams are shifting away from traditional ETL pipeline development toward a fundamentally different approach to data architecture. This evolution isn’t just changing how we work; it’s redefining what it means to be a data engineer.

The Great Pipeline Reduction

The evidence of this shift is everywhere. In a recent survey of over 500 data professionals conducted by Fivetran, 62% reported building fewer custom data pipelines compared to two years ago. Meanwhile, job postings for data engineering roles increasingly emphasize skills in data architecture, governance, and platform engineering over pipeline development.

This doesn’t mean data movement has decreased – quite the opposite. Data volumes continue to grow exponentially. What’s changed is how we’re approaching data transformation and delivery.

Three key technological shifts are driving this evolution:

  1. Compute-on-query architectures that transform data at query time rather than in advance
  2. Streaming-first platforms that process data continuously rather than in batches
  3. Semantic modeling layers that abstract business logic from physical implementation

Together, these approaches are making traditional ETL pipelines increasingly unnecessary, fundamentally changing how data teams operate.

The Rise of Compute-on-Query

For decades, pre-computing transformations made perfect sense. When compute was expensive and predictable query patterns were the norm, transforming data in advance optimized both cost and performance.

But today’s cloud data platforms like Snowflake, Databricks, and BigQuery have flipped this equation. Their massive parallel processing capabilities make it feasible – often preferable – to transform data at query time rather than in advance.

Take DoorDash’s data architecture evolution as an example. Originally built on classic ETL patterns with Airflow orchestrating transformations, their platform required hundreds of pipelines and a team dedicated to maintenance. After shifting to a compute-on-query model with a metrics layer in 2022, they reduced pipeline volume by over 70% while actually increasing analytical capabilities.

“We went from spending most of our time building and fixing pipelines to focusing on data products that directly impact business decisions,” explains Sarah Chen, DoorDash’s Director of Data Engineering. “Our data is fresher, our platform is more flexible, and our engineering time is spent on higher-value work.”

This approach doesn’t eliminate transformation – it just changes when and how it happens. Rather than rigid pipelines executing transformations on fixed schedules, transformations occur dynamically when users need the data.

The benefits are substantial:

  • Reduced maintenance overhead: Fewer pipelines mean fewer failure points and less debugging
  • Increased flexibility: Changes to transformation logic don’t require pipeline rewrites
  • Fresher data: Insights reflect the latest information rather than the last batch run
  • More efficient resource usage: Compute resources focus on data that’s actually being used

Streaming as the New Default

Alongside compute-on-query, streaming architectures are challenging the batch-oriented thinking that dominated traditional ETL.

Netflix’s migration from batch processing to a streaming-first architecture illustrates this shift dramatically. In 2018, most of their data platform relied on scheduled ETL jobs. By 2022, over 80% of their data processing had moved to streaming pipelines using Apache Kafka and a series of stateful processors.

“Batch processing is increasingly becoming the exception rather than the rule,” says Zhenzhong Xu, Netflix’s Data Platform Lead. “When you build systems that process events as they happen, you eliminate entire categories of batch ETL while delivering near real-time insights.”

This doesn’t mean all processing happens in real-time – rather, it means data flows continuously through the platform, with transformation happening along the way rather than in scheduled batches.

Key advantages include:

  • Reduced pipeline complexity: One continuous flow replaces multiple batch jobs
  • Near real-time data availability: Insights are available seconds or minutes after events occur
  • Simplified orchestration: Less need for complex scheduling and dependency management
  • More natural error handling: Issues are identified and addressed as they occur

The Semantic Layer Revolution

Perhaps the most profound shift in modern data architecture is the rise of semantic modeling layers – abstraction layers that separate business logic from physical implementation.

Companies like Airbnb, Spotify, and Uber have built sophisticated semantic layers that define metrics, entities, and relationships in a centralized, reusable way. When a business user needs information, the semantic layer generates optimized queries on the fly rather than pulling from pre-computed tables.

Airbnb’s Minerva platform exemplifies this approach. Instead of building ETL pipelines for each reporting need, they created a metrics layer that defines key business concepts. Analysts interact with these business concepts, and the system dynamically generates the necessary SQL.

“We went from hundreds of slightly different definitions of ‘active user’ or ‘booking value’ to single, consistent definitions used across the company,” explains Elena Grewal, Airbnb’s former Head of Data Science. “Our data engineers now focus on the platform and definitions rather than individual transformation pipelines.”

The impact is profound:

  • Consistency: Metrics are defined once and used consistently
  • Agility: New analytics needs don’t require new pipelines
  • Governance: Changes to definitions are managed centrally
  • Efficiency: Computation happens only when needed

The New Data Engineering: From Pipelines to Platforms

These shifts aren’t just changing technology – they’re transforming the data engineering role itself. As ETL pipelines become less central, data engineers are evolving into platform builders, architects, and enablers.

“Five years ago, 80% of our data engineering time went to building and maintaining pipelines,” says Marcus Wong, Principal Data Engineer at Shopify. “Today, it’s less than 30%. Instead, we focus on building self-service capabilities, ensuring data quality, and enabling domain teams to be data producers rather than just consumers.”

This evolution is reflected in how modern data teams structure their work:

  • Platform teams focus on core infrastructure, governance, and self-service capabilities
  • Domain-embedded engineers work with business units to model data and define metrics
  • Data product developers build specialized analytical applications that leverage the platform

The skills most valued in data engineers are shifting accordingly. Knowledge of specific ETL tools is becoming less important than understanding data modeling, distributed systems, and platform design.

Where Traditional ETL Still Matters

Does this mean traditional ETL is completely dead? Not quite. Several scenarios still call for purpose-built transformation pipelines:

1. Legacy System Integration

Organizations with complex legacy systems often need dedicated ETL processes to extract and normalize data before it enters modern platforms.

2. Highly Regulated Environments

Some industries require explicit data transformations for compliance reasons, with clear lineage and documented transformations at each step.

3. Performance-Critical Applications

For dashboards, applications, or reports used by hundreds or thousands of users, pre-computing transformations still makes sense for performance reasons.

4. Edge Computing Scenarios

In environments with limited connectivity or compute resources, traditional ETL remains valuable for processing data before transmission.

But even in these scenarios, the approach is evolving. Modern ETL tends to be more modular, metadata-driven, and integrated with broader data platforms rather than standing alone.

Preparing for the Future

For data professionals navigating this changing landscape, several strategies can help:

1. Embrace Declarative Over Imperative

Focus on defining what data should look like rather than how to transform it. Tools like dbt, which separate transformation logic from execution, represent a step in this direction.

2. Develop Semantic Modeling Skills

Understanding how to model business concepts, metrics, and relationships independent of physical implementation will be increasingly valuable.

3. Learn Streaming Fundamentals

Even if you don’t work directly with streaming systems today, understanding event-driven architectures and stateful processing will be essential as more platforms move in this direction.

4. Focus on Enabling Self-Service

The most successful data engineers today build platforms that empower domain experts and analysts rather than creating bottlenecks through centralized processing.

5. Understand Compute Optimization

As transformation moves closer to query time, skills in query optimization, caching strategies, and efficient compute utilization become more important.

Conclusion: From Pipeline Builders to Data Enablers

The shift away from traditional ETL doesn’t diminish the importance of data engineering – it elevates it. Rather than being caught in endless cycles of pipeline development and maintenance, data engineers are becoming the architects of systems that make data universally accessible, trustworthy, and actionable.

This evolution represents a maturation of our field. Just as software engineering evolved from assembly language to high-level frameworks and cloud services, data engineering is moving from artisanal pipeline crafting to platform building and enabling.

The future data engineer will spend less time moving and transforming data and more time ensuring it’s available, accurate, and aligned with business needs. We’re becoming less like plumbers and more like city planners – designing ecosystems where data flows naturally to where it’s needed, when it’s needed.

So is traditional ETL dead? Perhaps not entirely. But its dominance is certainly waning, making room for a new era of data engineering that promises greater agility, fresher insights, and more strategic impact for the organizations we serve.

The pipeline-building data engineer isn’t disappearing – they’re evolving into something more powerful: the enablers of truly data-driven organizations.


How has your experience with ETL pipelines changed over the past few years? Are you building more or fewer pipelines? Share your perspective in the comments below.

DataEngineering #ETLEvolution #ComputeOnQuery #StreamingArchitecture #SemanticLayer #ModernDataStack #DataPlatforms #DataTransformation #FutureOfETL #DataArchitecture #DataGovernance #RealTimeData #DataMesh #DataStrategy

By Alex

Leave a Reply

Your email address will not be published. Required fields are marked *