« Back to Solutions

Unify Your Financial Data Without Moving It: The Open Source Federated Query Engine

Financial institutions grapple with critical data trapped in disconnected legacy systems—from core banking to trading platforms. Aggregating this data for essential regulatory reporting is a slow, manual, and error-prone process, risking costly delays and compliance penalties.

This playbook introduces a cost-effective, cloud-native solution that queries your data directly where it lives. By leveraging the Trino federated query engine, you can run complex analytical queries across all your legacy systems simultaneously, without undertaking a massive, costly ETL project. Paired with Apache Superset for visualization and AWS Glue for a unified metadata catalog, this stack provides a single, virtual source of truth for faster, more accurate reporting.

Expected Outcomes

Eliminate manual data aggregation and normalization tasks for reporting.
Achieve a unified, near real-time view of data across all legacy systems.
Accelerate regulatory reporting cycles and reduce the risk of compliance errors.
Significantly lower total cost of ownership by avoiding expensive licensing for proprietary data integration tools.
Empower business analysts with direct, self-service access to previously siloed information.

Core Tools in This Stack

Trino (PrestoSQL)

Visit website

Trino is a high-performance, distributed SQL query engine for big data analytics. It allows querying data where it lives, including Hadoop, S3, Cassandra, relational databases, and more, enabling a single query to access and join data from multiple disparate sources.

Key Features

Cross-Source Federated Queries
Massively Parallel Processing (MPP) Architecture
Separation of Compute and Storage
Extensive Connector Ecosystem
ANSI SQL Compliant
High-Performance for Interactive Analytics
Pluggable and Extensible

Ideal For

Company Size: Medium, Large

Industries: Technology & Software, Business & Professional Services, Retail & E-commerce, Creative & Media

Pricing

Model: Open Source, Commercial Support Available

Tier: Free

Ease of Use

Moderate

Apache Superset

Visit website

Apache Superset is an open-source, modern data exploration and visualization platform that allows users of all skill levels to create interactive dashboards and beautiful visualizations from a wide variety of data sources.

Key Features

Interactive Dashboards
Wide Range of Visualizations
SQL Lab
No-Code Chart Builder
Lightweight Semantic Layer
Extensive Database Support
Cloud-Native Architecture
Extensible Security Model

Ideal For

Company Size: Small, Medium, Large

Industries: Technology & Software, Business & Professional Services, Retail & E-commerce, Creative & Media, Education & Non-Profit, Health & Wellness

Pricing

Model: Open Source

Tier: Free (Self-hosted)

Ease of Use

Moderate

AWS Glue Data Catalog

Visit website

AWS Glue Data Catalog is a fully managed, persistent metadata store within the AWS ecosystem. It acts as a central repository for structural and operational metadata, enabling users to discover, search, and query data assets across various sources like Amazon S3, RDS, and Redshift. It is a core component of the broader AWS Glue serverless data integration service.

Key Features

Automatic Schema Discovery with Crawlers
Centralized Metadata Repository
Serverless Architecture (No infrastructure to manage)
Apache Hive Metastore Compatibility
Integration with AWS Analytics (Athena, EMR, Redshift Spectrum)
Fine-grained Access Control via AWS Lake Formation and IAM
Schema Versioning and Evolution Tracking
Built-in Data Quality Rules and Monitoring

Ideal For

Company Size: Small, Medium, Large

Industries: Technology & Software, Business & Professional Services, Retail & E-commerce, Creative & Media, Education & Non-Profit, Health & Wellness, Other

Pricing

Model: Pay-as-you-go, Free Tier

Tier: Variable

Ease of Use

Medium

The Workflow

graph TD subgraph "Open Source Federated Query Engine" direction LR N0["Trino (PrestoSQL)"] N1["Apache Superset"] N2["AWS Glue Data Catalog"] N1 -- "Sends SQL queries" --> N0 N0 -- "Uses as metastore for schema" --> N2 N0 -- "Returns query results" --> N1 end classDef blue fill:#3498db,stroke:#2980b9,stroke-width:2px,color:#fff; classDef green fill:#2ecc71,stroke:#27ae60,stroke-width:2px,color:#fff; classDef orange fill:#f39c12,stroke:#d35400,stroke-width:2px,color:#fff; class N0 blue; class N1 blue; class N2 blue;

Integration Logic

Trino Hive Connector

This integration configures Trino's Hive connector to use the AWS Glue Data Catalog as its central metastore. This allows Trino to discover table schemas and data locations for files residing in an S3-based data lake. Apache Superset connects to the Trino cluster as a standard database source. When a user in Superset builds a chart or runs a query, Superset sends the SQL request to Trino. Trino then consults the AWS Glue Catalog to plan the query, fetches the required data directly from Amazon S3, and executes the query across its distributed workers. The results are returned to Superset for visualization, enabling interactive analysis over vast datasets without moving the data into a traditional data warehouse.

Unlock Your Financial Data Playbook

Streamline regulatory reporting and connect legacy systems to avoid costly compliance delays.

Unify Your Financial Data Without Moving It: The Open Source Federated Query Engine

Expected Outcomes

Core Tools in This Stack

Trino (PrestoSQL)

Key Features

Ideal For

Pricing

Ease of Use

Apache Superset

Key Features

Ideal For

Pricing

Ease of Use

AWS Glue Data Catalog

Key Features

Ideal For

Pricing

Ease of Use

The Workflow

Integration Logic

Trino Hive Connector

Unlock Your Financial Data Playbook