Unify Your Financial Data Without Moving It: The Open Source Federated Query Engine
Financial institutions grapple with critical data trapped in disconnected legacy systems—from core banking to trading platforms. Aggregating this data for essential regulatory reporting is a slow, manual, and error-prone process, risking costly delays and compliance penalties.
This playbook introduces a cost-effective, cloud-native solution that queries your data directly where it lives. By leveraging the Trino federated query engine, you can run complex analytical queries across all your legacy systems simultaneously, without undertaking a massive, costly ETL project. Paired with Apache Superset for visualization and AWS Glue for a unified metadata catalog, this stack provides a single, virtual source of truth for faster, more accurate reporting.
Expected Outcomes
- Eliminate manual data aggregation and normalization tasks for reporting.
- Achieve a unified, near real-time view of data across all legacy systems.
- Accelerate regulatory reporting cycles and reduce the risk of compliance errors.
- Significantly lower total cost of ownership by avoiding expensive licensing for proprietary data integration tools.
- Empower business analysts with direct, self-service access to previously siloed information.
Core Tools in This Stack

Trino (PrestoSQL)
Visit websiteTrino is a high-performance, distributed SQL query engine for big data analytics. It allows querying data where it lives, including Hadoop, S3, Cassandra, relational databases, and more, enabling a single query to access and join data from multiple disparate sources.
Key Features
- Cross-Source Federated Queries
- Massively Parallel Processing (MPP) Architecture
- Separation of Compute and Storage
- Extensive Connector Ecosystem
- ANSI SQL Compliant
- High-Performance for Interactive Analytics
- Pluggable and Extensible
Ideal For
Company Size: Medium, Large
Industries: Technology & Software, Business & Professional Services, Retail & E-commerce, Creative & Media
Pricing
Model: Open Source, Commercial Support Available
Tier: Free
Ease of Use
Moderate

Apache Superset
Visit websiteApache Superset is an open-source, modern data exploration and visualization platform that allows users of all skill levels to create interactive dashboards and beautiful visualizations from a wide variety of data sources.
Key Features
- Interactive Dashboards
- Wide Range of Visualizations
- SQL Lab
- No-Code Chart Builder
- Lightweight Semantic Layer
- Extensive Database Support
- Cloud-Native Architecture
- Extensible Security Model
Ideal For
Company Size: Small, Medium, Large
Industries: Technology & Software, Business & Professional Services, Retail & E-commerce, Creative & Media, Education & Non-Profit, Health & Wellness
Pricing
Model: Open Source
Tier: Free (Self-hosted)
Ease of Use
Moderate

AWS Glue Data Catalog
Visit websiteAWS Glue Data Catalog is a fully managed, persistent metadata store within the AWS ecosystem. It acts as a central repository for structural and operational metadata, enabling users to discover, search, and query data assets across various sources like Amazon S3, RDS, and Redshift. It is a core component of the broader AWS Glue serverless data integration service.
Key Features
- Automatic Schema Discovery with Crawlers
- Centralized Metadata Repository
- Serverless Architecture (No infrastructure to manage)
- Apache Hive Metastore Compatibility
- Integration with AWS Analytics (Athena, EMR, Redshift Spectrum)
- Fine-grained Access Control via AWS Lake Formation and IAM
- Schema Versioning and Evolution Tracking
- Built-in Data Quality Rules and Monitoring
Ideal For
Company Size: Small, Medium, Large
Industries: Technology & Software, Business & Professional Services, Retail & E-commerce, Creative & Media, Education & Non-Profit, Health & Wellness, Other
Pricing
Model: Pay-as-you-go, Free Tier
Tier: Variable
Ease of Use
Medium
The Workflow
Integration Logic
-
Trino Hive Connector
This integration configures Trino's Hive connector to use the AWS Glue Data Catalog as its central metastore. This allows Trino to discover table schemas and data locations for files residing in an S3-based data lake. Apache Superset connects to the Trino cluster as a standard database source. When a user in Superset builds a chart or runs a query, Superset sends the SQL request to Trino. Trino then consults the AWS Glue Catalog to plan the query, fetches the required data directly from Amazon S3, and executes the query across its distributed workers. The results are returned to Superset for visualization, enabling interactive analysis over vast datasets without moving the data into a traditional data warehouse.
Unlock Your Financial Data Playbook
Streamline regulatory reporting and connect legacy systems to avoid costly compliance delays.