By Sabri Skhiri
11 editions later, one of the biggest technological conferences in Central Europe changed its name to reflect the latest technological advancements. The BIG DATA TECHNOLOGY WARSAW SUMMIT became the DATA & AI WARSAW TECH SUMMIT, and the conference provided a rich platform for gaining fresh perspectives on data and AI, fostering a deeper understanding of the high-velocity advancements in the field. Our CTO, Sabri Skhiri, was present to gather the insights, and now it’s time to share: here’s a rundown of the key trends, the keynotes, talks, and innovations that took place during the conference.

The Trends
This is not always the case, but this conference was precisely in my exact area of expertise. The Data & AI Warsaw Tech Summit 2025 provided a comprehensive exploration of current trends and innovations in data and AI. The conference featured discussions on distributed event processing, application of agentic architectures, data governance, data architecture, and many other topics. The event provided opportunities to interact with industry leaders like Stephan Ewen, the founder of Restate and co-creator of Apache Flink. Attendees could also explore data governance solutions with companies such as Dataedo and engage in insightful discussions with professionals from various fields. These included discussions with content moderation professionals at LinkedIn, the Head of data platform at Bolt, and tech leaders at TUI and Booking.com. This event offered valuable insights into the rapid evolution of AI, encouraging participants to consider its broader context and implications.
Key Topics:
- Data Governance: Emphasized as a critical focus, with numerous talks and a strong presence of vendors in the exhibition hall.
- Generative AI (GenAI): Explored its disruptive impact on data science, enterprise landscapes, and the roles of computer scientists, including automation of tasks like testing at Ericsson and booking processes at TUI.
- High-Performance Data Architecture and Real-Time Processing: Addressed the need for robust architectures capable of handling real-time data efficiently.
- Data Architecture Implementations: Detailed discussions on data mesh and platforms such as Azure Fabric, Snowflake, Databricks, and DBT.
- Skill Development: Highlighted the importance of advancing skills to keep pace with evolving technologies, illustrated by a case study on agent selection using SVMs with input features from models like Claude Sonnet 3. achieving a poor 75% of accuracy.
Overall, the conference provided a rich platform for gaining fresh perspectives on data and AI, fostering a deeper understanding of the high-velocity advancements in the field.
The Keynotes
AI Fabric: Advanced Context Engineering for Smarter AI Solutions - AB Initio
At the Data & AI Warsaw Tech Summit 2025, Jonathan Sunderland, Strategic Consultant at Ab Initio, delivered a compelling keynote.
Traditional Enterprise Architecture
Sunderland began by outlining the conventional components of enterprise architecture:
- Data Storage Solutions: These include data lakehouses, data lakes, and data warehouses, which serve as repositories for vast amounts of structured and unstructured data.
- Internal Operational Systems: Core systems that manage daily business operations and ensure smooth internal workflows.
- APIs and External Services: Interfaces that facilitate interaction with external platforms, partners, and services, extending the enterprise’s capabilities.
- Business Process Orchestration: Tools and frameworks that integrate the aforementioned components, enabling efficient data flow and process automation, often through ETL processes and data pipelines.
Emergence of AI and Agentic Architectures
The integration of AI and agentic architectures introduces transformative opportunities:
Enhancing Data Quality: AI algorithms can automate data validation and cleansing processes, leading to more accurate and reliable datasets.
Data Classification and Metadata Generation: AI can automatically categorize data and generate metadata, improving data discoverability and management.
- Code Generation and Maintenance: AI-driven tools can assist in developing application code, debugging, and performing maintenance tasks, thereby accelerating the software development lifecycle.
- Integration of Internal Systems with Tools: AI facilitates seamless connections between disparate operational systems and tools, enhancing interoperability and efficiency.
- Automation of Business Processes: AI can design and implement workflows and data pipelines, reducing manual intervention and increasing process efficiency.

Challenges Posed by Large Language Models (LLMs)
Despite their potential, LLMs present several challenges:
- Non-Deterministic Outputs: LLMs can produce unpredictable results, which raises concerns about their reliability in enterprise applications.
- Accountability Issues: When AI generates code or makes decisions, determining responsibility becomes complex. It’s unclear who is accountable for errors or unintended consequences arising from AI-generated outputs. The speaker used the “Will Smith” analogy to explain that an action can destroy an entire career.
- Regulatory Compliance: Ensuring that AI systems adhere to regulations such as GDPR, ePrivacy, and other data protection laws is crucial. Missteps can lead to legal repercussions and damage to reputation.
Personal Reflections and Recommendations
Reflecting on Sunderland’s insights, it’s evident that addressing these challenges requires a comprehensive methodology encompassing:
- Modeling Agentic Architectures: Establishing processes to define agent roles, responsibilities, tasks, and interactions. This includes creating detailed documentation from initial requirements to final design, ensuring clarity and alignment.
- Implementing Control Points in the Generative Process:
- GenAI Operations: Continuous monitoring of AI behavior during operation to detect and address anomalies promptly.
- Cybersecurity Measures: Conducting security assessments tailored to AI agents to protect against vulnerabilities and threats.
- Legal Compliance Checks: Regular evaluations to ensure AI systems comply with relevant laws and regulations, mitigating legal risks.
- Establishing Control Points for Operational AI or Generated Code:
- GenAI Operations: Ongoing oversight of AI behavior to maintain performance and reliability.
- Cybersecurity Measures: Continuous security evaluations to safeguard against emerging threats.
- Developing a Comprehensive Practice Framework: Creating methodologies with clearly defined roles, responsibilities, documentation standards, and processes that are conducive to audits. This framework ensures that AI integration is systematic, transparent, and accountable.
The speaker’s use of the Will Smith analogy was impactful, but what truly resonated with me was the talk’s ability to make us reconsider AI’s influence on the enterprise landscape. This is exactly what we are covering in the Survey Corps Team, to define the new service offering in this segment.
HPE Innovation: AI Accelerating a Country – Defining Sovereignty in a New Era of AI
Veta Lohovska, Chief Technologist and Principal Data Scientist for AI and Supercomputing at HPE, delivered this keynote.
Understanding Sovereign AI and Its Importance
Sovereign AI refers to a nation’s ability to independently develop, deploy, and manage artificial intelligence technologies within its own borders. This concept has gained prominence in the current geopolitical climate, where technological autonomy is increasingly linked to national security, economic competitiveness, and cultural preservation. By achieving sovereignty in AI, countries can ensure that their AI systems align with national values, comply with local regulations, and safeguard sensitive data from foreign influence. This autonomy enables nations to harness AI’s transformative potential while maintaining control over their technological destiny.
Key Pillars of Sovereign AI
Lohovska emphasized three critical components essential for establishing Sovereign AI
- Infrastructure: Developing robust and scalable computing resources is foundational. Investing in high-performance computing (HPC) systems enables the processing of vast datasets necessary for AI applications. Such infrastructure supports advanced research and development, facilitating innovations across various sectors.
- Talent: Cultivating a skilled workforce proficient in AI and related technologies is vital. Educational initiatives and training programs are necessary to equip individuals with the expertise required to drive AI innovation and implementation effectively.
- Ecosystem: Fostering a collaborative environment that includes academia, industry, and government agencies encourages the sharing of knowledge, resources, and best practices. This synergy accelerates the development and deployment of AI solutions tailored to national needs.
LUMI: Empowering Europe’s AI Ambitions
A prime example of infrastructure supporting Sovereign AI is the LUMI supercomputer. Located in Kajaani, Finland, LUMI is one of Europe’s most powerful and energy-efficient supercomputers. It offers unparalleled computational capabilities, enabling researchers and enterprises to perform complex simulations and data analyses.
The LUMI AI Factory serves as an innovation hub, providing startups and small to medium-sized enterprises (SMEs) with access to cutting-edge AI resources. This initiative empowers European companies to develop and scale AI solutions, fostering a vibrant startup ecosystem. By leveraging LUMI’s capabilities, startups can accelerate product development, optimize operations, and enhance competitiveness in the global market.

Favourite Talks
My KPI is bigger than your KPI… but why? Integrating Airflow orchestration, infrastructure-as-code and documentation-as code into a unified framework for Metrics Management
Marcin Cinciała and Karolina Zinner, Senior Data Engineers at Allegro, presented this session. Integrating Airflow orchestration, infrastructure-as-code and documentation-as-code into a unified framework for Metrics Management
Context and Challenges
Allegro’s data ecosystem generates hundreds of numerical metrics weekly, delivered to decision-makers across the organization. While these metrics are integral to business operations, challenges have emerged:
- Inconsistent Documentation: Many metrics lacked proper documentation, leading to misunderstandings and misinterpretations.
- Redundant Efforts: Different teams often recreated similar metrics, resulting in inefficiencies and potential discrepancies.
- Monitoring Difficulties: Overseeing system usage across various platforms, including Excel, databases, and ETL processes, was cumbersome without a standardized approach.
Proposed Solution
To address these issues, the presenters advocated for a unified framework that unifies metric collection all along the development life cycle of any data pipelines. It integrates:
- Metrics Inventory Creation: Establishing a centralized repository to catalog all metrics prevents redundancy and ensures consistency.
- Standardized Documentation: Embedding Markdown documentation directly within the codebase allows for automatic HTML documentation generation using GitHub Actions. This approach ensures that documentation remains current and accessible.
- Infrastructure-as-Code (IaC): Defining infrastructure configurations as code promotes reproducibility and simplifies management.
- Airflow Orchestration: Utilizing Apache Airflow to orchestrate metric calculations ensures reliable and timely data processing.
Implementation Details
- Unified Table Schemas: The analytics team provides standardized table schemas, data processing definitions, data quality checks, and Service Level Agreements (SLAs).
- Documentation Structure: A predefined structure for documentation is stipulated, complemented by streamlined, copy-paste templates to facilitate adoption.
- Sample Processing and Universal Data Quality Checks: Offering sample processing scripts and universal data quality checks aids teams in maintaining consistency and reliability.
- Role Definition: Clear delineation of responsibilities ensures that specific teams are accountable for providing inputs, while others focus on verifying outputs and content.
Benefits Realized
Implementing this integrated framework has led to:
- Codebase Cleanup: Redundant and outdated code has been identified and removed, streamlining the development environment.
- Time Savings: Comprehensive and up-to-date documentation reduces the time spent searching for information and understanding metric definitions.
- Improved Data Quality: Standardized processes and checks facilitate quicker identification and rectification of errors, enhancing overall data reliability.
I liked the talk, because by integrating Airflow orchestration, infrastructure-as-code, and documentation-as-code, Allegro has established a robust framework for metrics management that promotes efficiency but directly contributes to the data and AI governance. This is an excellent example of the freedom in a box concept invented at Toyota (they are free to do whatever they want in a given and constrained environment). This approach is an efficient way to contribute to AI governance by exposing a clear and document technical lineage of data.
Automation in credit risk model life cycle - ING hubs
Marcin Jeżo, Expert Lead, and Damian Kowalik, Product Manager at ING Hubs Poland, presented this talk. They delved into the pivotal role of automation in enhancing efficiency, consistency, and compliance within the credit risk model lifecycle.
Context and Importance of Automation
In the evolving regulatory landscape, financial institutions face increasing pressure to maintain robust credit risk models that comply with stringent standards. Automation emerges as an interesting enabler, streamlining processes, reducing operational risks, and ensuring adherence to regulatory requirements. ING Hubs Poland, an international team based in Warsaw, focuses on building an innovative risk-focused community encompassing:
- Data and Tools: Developing and managing the technological infrastructure necessary for risk assessment.
- Risk Modelling: Creating models that accurately predict and assess credit risk.
- Model Risk Management: Overseeing the validation, approval, and monitoring of risk models to ensure their effectiveness and compliance.
The speakers concentrated on the first two components, emphasizing how automation can be integrated into these areas to enhance the credit risk model lifecycle.
Credit Risk Model Lifecycle Overview
The typical lifecycle of a credit risk model includes several stages:
- Planning: Defining objectives and requirements for the model.
- Data Collection: Gathering relevant data necessary for model development.
- Model Development: Creating the model using statistical and analytical techniques.
- Model Validation Pre-Approval: Assessing the model’s performance and accuracy before seeking approval.
- Model Approval by Regulator: Obtaining necessary approvals from regulatory bodies.
- Implementation: Deploying the model into the operational environment.
- Ongoing Monitoring: Continuously observing the model’s performance to ensure it remains effective.
- Periodical Validation: Conducting regular reviews to validate the model’s accuracy and relevance.
- Decommissioning: Retiring the model when it is no longer useful or relevant.
Focus on Model Monitoring and Automation
A critical phase in this lifecycle is Ongoing Monitoring, which involves:
- Data Collection: Accumulating data on the model’s performance.
- Data Handover: Transferring data between teams or systems for further analysis.
- Analytics and Testing: Evaluating the model’s outputs to detect any anomalies or deviations.
- Review: Assessing the findings and determining necessary actions
From these activities, quality improvement recommendations and new control points are identified. The speakers highlighted the introduction of automation in this phase to expedite delivery and enhance accuracy. They adopted a structured approach:
- Standardization: Establishing uniform processes and protocols to create a solid foundation for automation.
- Automation: Implementing automated tools and systems to handle repetitive tasks, reducing manual intervention and the potential for errors.
- New Functionalities: Developing additional features that leverage automation to provide deeper insights and more robust risk assessments and LLM for regulation understanding and monitoring.
Value Brought by Automation
The integration of automation into the credit risk model lifecycle offers several benefits:
- Reusability: Automated components can be reused across different models and processes, enhancing efficiency.
- Standardized Workflow: Ensures consistency across various stages of the model lifecycle.
- Configurable Logic: Allows for flexibility in adapting models to meet specific requirements or changes in the regulatory environment.
- Collaborative Environment: Facilitates better communication and cooperation among teams involved in the model lifecycle.
Implementation Tools and Technologies
To achieve these automation objectives, ING Hubs Poland utilized:
- SAS for Modeling Logic: Employing SAS software to develop and manage complex statistical models.
- Cloud Platforms for Scalability and Storage: Leveraging cloud technologies to ensure scalable resources and secure data storage.
- Large Language Models (LLMs) and AI: Utilizing AI-driven solutions to interpret regulatory texts, suggest improvements, and provide deeper insights, thereby accelerating decision-making.
Conclusion
The session underscored the transformative impact of automation in the credit risk model lifecycle. By standardizing processes and integrating advanced technologies, financial institutions can achieve faster deliveries, improved compliance, and more robust risk assessments. This approach not only enhances operational efficiency but also ensures that models remain accurate, reliable, and aligned with regulatory expectations. What I really liked in this talk is the standardization effect of the automation. Exactly as SAP recommended a Business process re-engineering plan when they deployed, here we see that a standardization is absolutely necessary before automate anything. That is a very relevant learning.

Architecting a Scalable and Cost-Effective Data Lakehouse
The talk addressed the complexities and considerations involved in developing a data lakehouse architecture that balances scalability with cost efficiency.
Understanding the Data Lakehouse
A data lakehouse combines elements of data lakes and data warehouses, aiming to provide the flexibility of data lakes with the structured querying capabilities of data warehouses. Key characteristics include:
- Decoupled Scalable Storage and Compute: Separating storage from compute resources allows for independent scaling, optimizing performance and cost.
- Open File Formats: Utilizing formats like Parquet or ORC ensures compatibility and avoids vendor lock-in.
- ACID Transactions: Ensuring data reliability and consistency through support for atomicity, consistency, isolation, and durability.
- Metadata Management and Governance: Implementing robust systems for data cataloging, lineage tracking, and access control to maintain data integrity and compliance.
Evaluating Suitability
While data lakehouses offer numerous advantages, they may not be suitable for every organization. Considerations include:
- Implementation Complexity: The architectural design and integration can be intricate, requiring substantial expertise.
- Data Volume: For organizations with smaller datasets, a lakehouse might be excessive and not cost-effective.
Recommendations for Adoption
To determine if a data lakehouse is appropriate:
- Proof of Concept (PoC): Conduct a PoC with anticipated data volumes to assess feasibility and performance.
- Data Access Needs: Evaluate whether direct access to raw data is necessary for your use cases.
- Total Cost of Ownership (TCO): Analyze costs associated with development, maintenance, and ETL processes to ensure alignment with budgetary constraints.
Designing the Architecture
Prior to selecting technologies, it’s crucial to engage in comprehensive data architecture design sessions. This approach ensures that the chosen solutions align with organizational needs and capabilities. Key considerations include:
- Functional Requirements: Determine the necessity for features such as machine learning development, MLOps, real-time processing, streaming capabilities, data mesh architecture, self-service analytics, and fine-grained access control.
- Organizational Constraints: Assess existing skill sets, current technology stacks, and platform preferences.
- Vendor Lock-In: Evaluate the implications of committing to a single vendor versus adopting an open-source approach.
Security and Compliance
Incorporating cybersecurity measures from the outset is imperative. Considerations include:
- Networking: Implement closed private networking to safeguard data in transit.
- Compute Resources: Utilize serverless compute options to enhance scalability and reduce operational overhead.
- Data Protection: Do we need to apply encryption protocols and robust Identity? What about Access Management (IAM) policies to protect sensitive information?
DevOps Integration
Engaging DevOps teams from the beginning facilitates:
- Automation: Implementing Infrastructure as Code (IaC) and Continuous Integration/Continuous Deployment (CI/CD) pipelines to streamline development and deployment processes.
- Efficiency: Contrary to misconceptions, early DevOps involvement accelerates project timelines and enhances delivery performance.
Tech Stack Selection
Starting with managed platforms can simplify implementation. Examples include:
- Databricks: Databricks is a unified, cloud-based analytics platform designed for data engineering, data science, and machine learning. Built on Apache Spark, it combines the functionalities of data warehouses and data lakes into a “lakehouse” architecture, enabling organizations to process and analyze structured and unstructured data. Databricks provides collaborative notebooks, automated cluster management, and integrates with various cloud services to streamline data workflows.
- DBT (Data Build Tool): dbt is an open-source command-line tool that enables data analysts and engineers to transform data within their data warehouses by writing modular SQL queries. It focuses on the transformation step in the ELT (Extract, Load, Transform) process, allowing users to create, test, and document data models efficiently. By integrating software engineering best practices like version control and testing, dbt enhances collaboration and ensures data reliability.
- Azure Fabric: An enterprise-ready, end-to-end analytics platform that unifies data movement, processing, ingestion, transformation, and real-time event routing. It integrates services like Data Engineering, Data Factory, Data Science, Real-Time Intelligence, Data Warehouse, and Databases into a cohesive stack. Microsoft Learn
- Snowflake: A cloud-based data platform offering data storage, processing, and analytic solutions. It allows seamless data sharing across multiple clouds and supports a wide range of workloads, including data engineering, data lakes, and data science.
- Apache Iceberg: An open-source table format for large analytic datasets. It brings the reliability and simplicity of SQL tables to big data, enabling multiple engines like Spark, Trino, Flink, and Presto to work with the same tables concurrently.
In scenarios where organizations aim to retain open-source solutions, especially those transitioning from legacy systems like Hadoop, careful consideration of the existing ecosystem and desired level of openness is essential.
Conclusion
Architecting a scalable and cost-effective data lakehouse requires a strategic approach that prioritizes thorough planning, aligns with organizational needs, and integrates security and DevOps practices from the outset. This is really what I liked in this talk, although there is not something new from our data architecture practice, it highlighted a mental process for selecting the right stack.
AI-Driven Software Testing: Redefining Quality and Innovation in the Telecommunications Industry
Sahar Tahvili, Ph.D., delivered this presentation.
Challenges in Software Testing
Dr. Tahvili outlined several challenges inherent in software testing within the telecommunications sector:
- Domain Knowledge: A deep understanding of the telecommunications domain is essential for effective testing.
- Programming Skills: Testers must possess strong programming abilities to develop and maintain test scripts.
- Software Testing Methodologies: Familiarity with various testing methodologies is crucial for selecting appropriate strategies.
- Multiple Test Environments: Managing and configuring diverse testing environments adds complexity.
- Manual Mapping of Test Scripts to Environments: The manual association of test scripts with corresponding environments is time-consuming and error-prone.
Optimization Through AI
To address these challenges, Dr. Tahvili proposed leveraging artificial intelligence to optimize several aspects:
- Test Environment Optimization: AI can dynamically allocate and configure testing environments based on current requirements.
- Dynamic Scheduling and Building of Test Environments: AI-driven systems can schedule tests and construct necessary environments in real-time.
- CI/CD Pipeline Selection: AI can assist in selecting the most suitable continuous integration and deployment pipelines for specific testing scenarios.
Case Studies
Dr. Tahvili presented two case studies illustrating the application of AI in software testing:
- Scheduling Test Channels Using AI: In this scenario, the team faced a high volume of manual test requests for CloudRAN, including decisions on platforms, environments, and simulators. An expert system based on reinforcement learning was developed to define optimal configurations and scheduling, considering resource availability. This approach streamlined the testing process and improved efficiency. Further details are available in their published paper.
- Automated Generation of Testing Pipelines: This case involved translating functional test requirements into specific test types (e.g., system-level tests, unit tests, high-availability tests). AI was employed to generate testing pipelines by combining appropriate test types, thereby automating the pipeline creation process and reducing manual effort.
Common AI Challenges
While AI offers significant benefits, its implementation in software testing is not without challenges:
- Data Issues: Effective AI models require substantial, accurate data. Challenges include data scarcity, poor data management, and inaccuracies.
- Complexity and Lack of Explainability: AI models can be complex and often operate as “black boxes,” making it difficult to interpret their decisions and build trust among stakeholders.
- Architectural Considerations: Integrating AI into existing systems necessitates careful planning regarding AI strategies, cybersecurity risks, and cloud integration to ensure seamless and secure implementation.
Conclusion
Dr. Tahvili’s presentation highlighted the transformative potential of AI in redefining software testing within the telecommunications industry. By addressing existing challenges and optimizing testing processes, AI can significantly enhance quality and innovation. However, organizations must navigate associated challenges thoughtfully to fully realize these benefit

Graphs for real-time Fraud Detection and Prevention
Deepak Patankar and Mathijs de Jong from Booking.com presented this session. They discussed how Booking.com leverages graph technology to enhance its fraud detection mechanisms, addressing various fraudulent activities such as stolen credit cards, marketing campaign abuses, fake hotels and reviews, account takeovers, and issues related to guest trust and safety. These fraudulent activities not only result in financial losses but also damage the company’s reputation and erode customer trust. Medium post.
Graph Modeling in Fraud Detection
Recognizing the interconnected nature of fraudulent activities, Booking.com incorporated graph technology into its fraud detection initiatives. In this model:
- Nodes represent entities like credit cards, email addresses, device IDs, and confirmed fraud instances.
- Edges depict relationships between these entities, illustrating how they interact or are associated.
This structure allows for the visualization and analysis of complex relationships that might indicate fraudulent behavior.
Types of Graphs Observed
The presenters highlighted different graph structures encountered:
- Simple Graphs: For example, a user connected to two credit cards, indicating straightforward relationships.
- Highly Connected Graphs: Such as those involving travel agencies with extensive interconnections, reflecting complex networks.
- Clustered Graphs: Multiple clusters indicating groups of interconnected entities, which could signify coordinated fraudulent activities.
Inferring Fraud through Graph Analysis
To detect fraud, Booking.com employs a combination of:
- Descriptive Graph Features: Calculating metrics like node counts, temporal features, degree features, PageRank, and risk features to quantify the characteristics of the graph.
- Machine Learning Models: Training classifiers using these graph features to predict the likelihood of fraudulent activity.
- Rule-Based Systems: Implementing heuristic rules that leverage graph structures to identify suspicious patterns.
Technical Implementation
The technical stack utilized includes:
- JanusGraph with Cassandra Backend: Serving as the distributed graph database to store and manage the graph data.
- Gremlin Query Language: Part of the Apache TinkerPop framework, used for traversing and querying the graph.
The system performs breadth-first searches with optimizations like batching and database indexing to ensure efficient data retrieval. (Author’s note: a simple graph traversal query).
Performance and Impact
Given Booking.com’s scale, handling millions of reservations daily, the system is designed to meet stringent performance requirements:
- Availability: Achieving 99.95% uptime.
- Response Time: Completing 99% of requests in under a second.
- Feature Extraction: Graph features are extracted in less than 300 milliseconds, even for large graphs.
The integration of graph features into machine learning models has led to a 20% reduction in fraud costs, while heuristic methods utilizing graph features have achieved a 33% reduction. Additionally, these graph-based analyses serve as evidence in dispute resolutions, enhancing the company’s ability to address fraudulent claims effectively.
Conclusion
If you know me, you should know that I am a big fan of graph technology since 2011 and that is so cool to see massive adoption today. I can give examples in pharma, telecom, banking, etc. The team claimed that they were starting working on GNN to improve their results, and I did not miss the opportunity to suggest them to use INGENIOUS. Indeed, an explainable Graph embedding can help explain why a user was projected so close to a fraud node. FYI, we are also working on leveraging this explainable graph embedding approach in a graph RAG. In a graph RAG, if we want to use a GNN we need to translate back the extracted node and edge vectors into node and edge and then building a textual context for the RAG. Some very recent methods (identified by our SOTA Agentic architecture) try to learn a fusion space with contrastive learning (CF ASGARD research project – track YGGDRASIL). Using an explainable approach simplify the process, since we just need to retrieve the nodes and edges that explain the most vectors identified as relevant.