[EACH TIME SLOT OFFERS A SESSION IN ENGLISH! SO YOU CAN ATTEND MARCH 27 ALL DAY IN ENGLISH.
WORKSHOPS ON MARCH 28 ARE IN-PERSON ONLY, IN UTRECHT.]
In the last 12-18 months we have seen many different architectures emerge from many different vendors who claim to be offering ‘the modern data architecture solution’ for the data-driven enterprise. These range from streaming data platforms to data lakes, to cloud data warehouses supporting structured, semi-structured and unstructured data, cloud data warehouses supporting external tables and federated query processing, lakehouses, data fabric, and federated query platforms offering virtual views of data and virtual data products on data in data lakes and lakehouses. In addition, all of these vendor architectures are claiming to support the building of data products in a data mesh. It’s not surprising therefore, that customers are confused as to which option to choose.
However, in 2023, key changes have emerged including much broader support for open table formats such as Apache Iceberg, Apache Hudi and Delta Lake in many other vendor data platforms. In addition, we have seen significant new milestones in extending the ISO SQL Standard to support new kinds of analytics in general purpose SQL. Also, AI has also advanced to work across any type of data.
The key question is what does this all mean for data management? What is the impact of this on analytical data platforms and what does it mean for customers? This session looks at this evolution and helps customers realise the potential of what’s now possible and how they can exploit it for competitive advantage.
Ever since Google announced that “their knowledge graph allowed searching for things, not strings”, the term “knowledge graph” has been widely adopted, to denote any graph-like network of interrelated typed entities and concepts that can be used to integrate, share and exploit data and knowledge.
This idea of interconnected data under common semantics is actually much older and the term is a rebranding of several other concepts and research areas (semantic networks, knowledge bases, ontologies, semantic web, linked data etc). Google popularized this idea and made it more visible to the public and the industry, the result being several prominent companies, developing and using their own knowledge graphs for data integration, data analytics, semantic search, question answering and other cognitive applications.
As the use of knowledge graphs continues to expand across various domains, the need for ensuring the accuracy, reliability, and consensus of semantic information becomes paramount. The intricacies involved in constructing and utilizing knowledge graphs present a spectrum of challenges, from data quality assurance to ensuring scalability and adaptability to evolving contexts.
In this talk, we will delve deeper into the significance of knowledge graphs as facilitators of large-scale data semantics. The discussion will encompass the core concepts, challenges, and strategic considerations that architects and decision-makers encounter while initiating and implementing knowledge graph projects.
The session will cover:
MotherDuck is a new service that connects DuckDB to the cloud. It introduces the concept of “hybrid query processing“: the ability to execute queries partly on the client and partly in the cloud. The talk covers the motivation for MotherDuck and some of its use cases; as well as the main characteristics of its system architecture, which heavily uses the extension mechanisms of DuckDB. To provide context, the talk will therefore also provide a brief overview of the DuckDB architecture.
The emergence of generative AI has been described as a major breakthrough in technology. It has reduced the time to create new content and triggered a new wave of innovation that is impacting almost every type of software. New tools, applications and functionality are already emerging that are dramatically improving productivity, simplifying user experiences and paving the way for new ways of working. In this keynote session, Mike Ferguson, Europe’s leading IT industry analyst on Data Management and Analytics, looks at the impact generative AI is having on Data Management, BI and Data Science and what it can do to help shorten time to value.
Traditioneel zijn datawarehouses primair ontworpen voor het oplossen van analysevraagstukken. Met de opkomst van data-democratisering groeit de behoefte om data breder binnen organisaties in te zetten. Dataconsumenten willen de beschikbare gegevens vrijer benutten, en historische data in datawarehouses wordt steeds waardevoller als bron voor het trainen van AI-modellen. In dit evoluerende landschap wordt het integreren van privacy by design in de architectuur essentieel. Het moet niet langer worden gezien als een hindernis, maar eerder als een katalysator voor deze vooruitgang. Het kwadrantenmodel van Damhof biedt hierbij een leidraad. Door deze benadering toe te passen, ontstaat niet alleen de mogelijkheid om te voldoen aan de groeiende eisen van dataconsumptie en AI-ontwikkelingen, maar leggen we ook een solide basis waarop innovatie wordt gestimuleerd.
– Datawarehouses en de rol binnen datascience
– Privacy by Design als katalysator
– Kwadrantenmodel in combinatie met datavirtualisatie
– Kostenreductie van experimenten.
Erasmus University Rotterdam (EUR) is one of the largest academic institutions of the country whose mission is ‘creating a positive societal impact’, and where the United Nations Sustainable Development Goals serve as a compass for research and education alike. With the variety and diversity of topics within EUR, an open, flexible, affordable, and easy to use data & analytics solution is key to support data & AI projects. At the same time there are many internal and external factors that need to be considered: the adoption of and migration to cloud solutions, the push for open science and open source, an ever faster changing technology landscape, and finally the breathtaking speed with which AI solutions are coming to market. Making future proof choices in this environment is a daunting task as one could imagine. Nevertheless, choices have been made and consist of a mix of open source and proprietary solutions, both on-premise and in the cloud, and guided by modern software engineering principles. This session will highlight the following:
Data governance is the process of managing the availability, usability, integrity, and security of data in an organization. It is essential for ensuring that data is used ethically, responsibly, and in compliance with regulations and standards. Data governance also enables the development and deployment of AI systems that are aligned with the values, goals, and expectations of the stakeholders and the society. In this keynote, we will discuss how data governance can serve as a keystone for building ethical AI and digital trust. We will explore the challenges and opportunities of data governance in the context of AI, and present some best practices and frameworks for implementing data governance in AI projects. We will also share some examples and case studies of how data governance can help achieve ethical AI and digital trust outcomes. The keynote will conclude with some recommendations and future directions for data governance in the AI era.
By the end of this session, you will be able to:
The Data Mesh approach has been well on its way as an alternative data management approach that does justice to the federative nature of most organizations and the need to provide ownership of data as close as possible to the business domains – where data is actually created and used. However, the transformational impact of Data Mesh is potentially big, and many organizations have found it difficult to implement the approach in all of its dimensions at once. Why not take a lighter approach, reaping benefits one by one, rather than going for an unprepared, deep dive into the Data Mesh rabbit hole?
Whether you call it a conceptual data model, a domain map, a business object model, or even a “thing model,” a concept model is invaluable to process and architecture initiatives. Why? Because processes, capabilities, and solutions act on “things” – Settle Claim, Register Unit, Resolve Service Issue, and so on. Those things are usually “entities” or “objects” in the concept model, and clarity on “what is one of these things?” contributes immensely to clarity on what the corresponding processes are.
After introducing methods to get people, even C-level executives, engaged in concept modelling, we’ll introduce and get practice with guidelines to ensure proper naming and definition of entities/concepts/business objects. We’ll also see that success depends on recognising that a concept model is a description of a business, not a description of a database. Another key – don’t call it a data model!
Drawing on almost forty years of successful modelling, on projects of every size and type, this session introduces proven techniques backed up with current, real-life examples. Topics include:
In the last 12-18 months we have seen many different architectures emerge from many different vendors who claim to be offering ‘the modern data architecture solution’ for the data-driven enterprise. These range from streaming data platforms to data lakes, to cloud data warehouses supporting structured, semi-structured and unstructured data, cloud data warehouses supporting external tables and federated query processing, lakehouses, data fabric, and federated query platforms offering virtual views of data and virtual data products on data in data lakes and lakehouses. In addition, all of these vendor architectures are claiming to support the building of data products in a data mesh. It’s not surprising therefore, that customers are confused as to which option to choose.
However, in 2023, key changes have emerged including much broader support for open table formats such as Apache Iceberg, Apache Hudi and Delta Lake in many other vendor data platforms. In addition, we have seen significant new milestones in extending the ISO SQL Standard to support new kinds of analytics in general purpose SQL. Also, AI has also advanced to work across any type of data.
The key question is what does this all mean for data management? What is the impact of this on analytical data platforms and what does it mean for customers? This session looks at this evolution and helps customers realise the potential of what’s now possible and how they can exploit it for competitive advantage.
Ever since Google announced that “their knowledge graph allowed searching for things, not strings”, the term “knowledge graph” has been widely adopted, to denote any graph-like network of interrelated typed entities and concepts that can be used to integrate, share and exploit data and knowledge.
This idea of interconnected data under common semantics is actually much older and the term is a rebranding of several other concepts and research areas (semantic networks, knowledge bases, ontologies, semantic web, linked data etc). Google popularized this idea and made it more visible to the public and the industry, the result being several prominent companies, developing and using their own knowledge graphs for data integration, data analytics, semantic search, question answering and other cognitive applications.
As the use of knowledge graphs continues to expand across various domains, the need for ensuring the accuracy, reliability, and consensus of semantic information becomes paramount. The intricacies involved in constructing and utilizing knowledge graphs present a spectrum of challenges, from data quality assurance to ensuring scalability and adaptability to evolving contexts.
In this talk, we will delve deeper into the significance of knowledge graphs as facilitators of large-scale data semantics. The discussion will encompass the core concepts, challenges, and strategic considerations that architects and decision-makers encounter while initiating and implementing knowledge graph projects.
The session will cover:
MotherDuck is a new service that connects DuckDB to the cloud. It introduces the concept of “hybrid query processing“: the ability to execute queries partly on the client and partly in the cloud. The talk covers the motivation for MotherDuck and some of its use cases; as well as the main characteristics of its system architecture, which heavily uses the extension mechanisms of DuckDB. To provide context, the talk will therefore also provide a brief overview of the DuckDB architecture.
The emergence of generative AI has been described as a major breakthrough in technology. It has reduced the time to create new content and triggered a new wave of innovation that is impacting almost every type of software. New tools, applications and functionality are already emerging that are dramatically improving productivity, simplifying user experiences and paving the way for new ways of working. In this keynote session, Mike Ferguson, Europe’s leading IT industry analyst on Data Management and Analytics, looks at the impact generative AI is having on Data Management, BI and Data Science and what it can do to help shorten time to value.
Traditioneel zijn datawarehouses primair ontworpen voor het oplossen van analysevraagstukken. Met de opkomst van data-democratisering groeit de behoefte om data breder binnen organisaties in te zetten. Dataconsumenten willen de beschikbare gegevens vrijer benutten, en historische data in datawarehouses wordt steeds waardevoller als bron voor het trainen van AI-modellen. In dit evoluerende landschap wordt het integreren van privacy by design in de architectuur essentieel. Het moet niet langer worden gezien als een hindernis, maar eerder als een katalysator voor deze vooruitgang. Het kwadrantenmodel van Damhof biedt hierbij een leidraad. Door deze benadering toe te passen, ontstaat niet alleen de mogelijkheid om te voldoen aan de groeiende eisen van dataconsumptie en AI-ontwikkelingen, maar leggen we ook een solide basis waarop innovatie wordt gestimuleerd.
– Datawarehouses en de rol binnen datascience
– Privacy by Design als katalysator
– Kwadrantenmodel in combinatie met datavirtualisatie
– Kostenreductie van experimenten.
Erasmus University Rotterdam (EUR) is one of the largest academic institutions of the country whose mission is ‘creating a positive societal impact’, and where the United Nations Sustainable Development Goals serve as a compass for research and education alike. With the variety and diversity of topics within EUR, an open, flexible, affordable, and easy to use data & analytics solution is key to support data & AI projects. At the same time there are many internal and external factors that need to be considered: the adoption of and migration to cloud solutions, the push for open science and open source, an ever faster changing technology landscape, and finally the breathtaking speed with which AI solutions are coming to market. Making future proof choices in this environment is a daunting task as one could imagine. Nevertheless, choices have been made and consist of a mix of open source and proprietary solutions, both on-premise and in the cloud, and guided by modern software engineering principles. This session will highlight the following:
Data governance is the process of managing the availability, usability, integrity, and security of data in an organization. It is essential for ensuring that data is used ethically, responsibly, and in compliance with regulations and standards. Data governance also enables the development and deployment of AI systems that are aligned with the values, goals, and expectations of the stakeholders and the society. In this keynote, we will discuss how data governance can serve as a keystone for building ethical AI and digital trust. We will explore the challenges and opportunities of data governance in the context of AI, and present some best practices and frameworks for implementing data governance in AI projects. We will also share some examples and case studies of how data governance can help achieve ethical AI and digital trust outcomes. The keynote will conclude with some recommendations and future directions for data governance in the AI era.
By the end of this session, you will be able to:
The Data Mesh approach has been well on its way as an alternative data management approach that does justice to the federative nature of most organizations and the need to provide ownership of data as close as possible to the business domains – where data is actually created and used. However, the transformational impact of Data Mesh is potentially big, and many organizations have found it difficult to implement the approach in all of its dimensions at once. Why not take a lighter approach, reaping benefits one by one, rather than going for an unprepared, deep dive into the Data Mesh rabbit hole?
Whether you call it a conceptual data model, a domain map, a business object model, or even a “thing model,” a concept model is invaluable to process and architecture initiatives. Why? Because processes, capabilities, and solutions act on “things” – Settle Claim, Register Unit, Resolve Service Issue, and so on. Those things are usually “entities” or “objects” in the concept model, and clarity on “what is one of these things?” contributes immensely to clarity on what the corresponding processes are.
After introducing methods to get people, even C-level executives, engaged in concept modelling, we’ll introduce and get practice with guidelines to ensure proper naming and definition of entities/concepts/business objects. We’ll also see that success depends on recognising that a concept model is a description of a business, not a description of a database. Another key – don’t call it a data model!
Drawing on almost forty years of successful modelling, on projects of every size and type, this session introduces proven techniques backed up with current, real-life examples. Topics include:
Whether you call it a conceptual data model, a domain model, a business object model, or even a “thing model,” the concept model is seeing a worldwide resurgence of interest. Why? Because a concept model is a fundamental technique for improving communication among stakeholders in any sort of initiative. Sadly, that communication often gets lost – in the clouds, in the weeds, or in chasing the latest bright and shiny object. Having experienced this, Business Analysts everywhere are realizing Concept Modelling is a powerful addition to their BA toolkit. This session will even show how a concept model can be used to easily identify use cases, user stories, services, and other functional requirements.
Realizing the value of concept modelling is also, surprisingly, taking hold in the data community. “Surprisingly” because many data practitioners had seen concept modelling as an “old school” technique. Not anymore! In the past few years, data professionals who have seen their big data, data science/AI, data lake, data mesh, data fabric, data lakehouse, etc. efforts fail to deliver expected benefits realise it is because they are not based on a shared view of the enterprise and the things it cares about. That’s where concept modelling helps. Data management/governance teams are (or should be!) taking advantage of the current support for Concept Modelling. After all, we can’t manage what hasn’t been modelled!
The Agile community is especially seeing the need for concept modelling. Because Agile is now the default approach, even on enterprise-scale initiatives, Agile teams need more than some user stories on Post-its in their backlog. Concept modelling is being embraced as an essential foundation on which to envision and develop solutions. In all these cases, the key is to see a concept model as a description of a business, not a technical description of a database schema.
This workshop introduces concept modelling from a non-technical perspective, provides tips and guidelines for the analyst, and explores entity-relationship modelling at conceptual and logical levels using techniques that maximise client engagement and understanding. We’ll also look at techniques for facilitating concept modelling sessions (virtually and in-person), applying concept modelling within other disciplines (e.g., process change or business analysis,) and moving into more complex modelling situations.
Drawing on over forty years of successful consulting and modelling, on projects of every size and type, this session provides proven techniques backed up with current, real-life examples.
Topics include:
Learning Objectives:
Most companies today are storing data and running applications in a hybrid multi-cloud environment. Analytical systems tend to be centralised and siloed like data warehouses and data marts for BI, cloud storage data lakes for data science and stand-alone streaming analytical systems for real-time analysis. These centralised systems rely on data engineers and data scientists working within each silo to ingest data from many different sources and engineer it for use in a specific analytical system or machine learning models. There are many issues with this centralised, siloed approach including multiple tools to prepare and integrate data, reinvention of data integration pipelines in each silo and centralised data engineering with poor understanding of source data unable to keep pace with business demands for new data.
To address these issues, a new approach called Data Mesh emerged in late 2019 attempting to accelerate creation of data for use in multiple analytical workloads. Data Mesh is a decentralised business domain-oriented approach to data ownership and data engineering to create a mesh of reusable data products that can be created once and shared across multiple analytical systems and workloads.
This half-day workshop looks at the development of data products in detail and also, how can you use a data marketplace to share and govern the sharing of data products across the enterprise to shorten time to value.
Learning Objectives:
Who is it for?
This seminar is intended for business data analysts, data architects, chief data officers, master data management professionals, data scientists, IT ETL developers, and data governance professionals. It assumes you understand basic data management principles and data architecture plus a reasonable understanding of data cleansing, data integration, data catalogs, data lakes and data governance.
Detailed course outline
Most companies today are storing data and running applications in a hybrid multi-cloud environment. Analytical systems tend to be centralised and siloed like data warehouses and data marts for BI, cloud storage data lakes or Hadoop for data science and stand-alone streaming analytical systems for real-time analysis. These centralised systems rely on data engineers and data scientists working within each silo to ingest data from many different sources, clean and integrate it for use in a specific analytical system or machine learning models. There are many issues with this centralised, siloed approach including multiple tools to prepare and integrate data, reinvention of data integration pipelines in each silo and centralised data engineering with poor understanding of source data unable to keep pace with business demands for new data. Also, master data is not well managed.
To address these issues, a new approach emerged in late 2019 attempting to accelerate creation of data for use in multiple analytical workloads. That approach is Data Mesh. Data Mesh is a decentralised business domain-oriented approach to data ownership and data engineering to create a mesh of reusable data products that can be created once and shared across multiple analytical systems and workloads. A Data Mesh can be implemented in a number of ways. These include using one or more cloud storage accounts on cloud storage, on an organised data lake, on a Lakehouse, on a data cloud, using Kafka or using data virtualisation. Data products can then be consumed in other pipelines for use in streaming analytics, Data Warehouses or Lakehouse Gold Tables, for use in business intelligence, feature stores for use data science, graph databases for use in graph analysis and other analytical workloads.
This half-day workshop looks at the development of data products in detail. It also looks at the strengths and weaknesses of data mesh implementation options for data product development. Which architecture is best to implement this? How do you co-ordinate multiple domain-oriented teams and use common data infrastructure software like Data Fabric to create high-quality, compliant, reusable, data products in a Data Mesh. Is there a methodology for creating data products? Also, how can you use a data marketplace to share and govern the sharing of data products? The objective is to shorten time to value while also ensuring that data is correctly governed and engineered in a decentralised environment. It also looks at the organisational implications of Data Mesh and how to create sharable data products for use as master data, in a data warehouse, in data science, in graph analysis and in real-time streaming analytics to drive business value? Technologies discussed includes data catalogs, data fabric for collaborative development of data integration pipelines to create data products, DataOps to speed up the process, data orchestration automation, data observability and data marketplaces.
In today’s data-driven landscape, the concept of a knowledge graph has emerged as a pivotal framework for managing and utilizing interconnected data and information. Stemming from Google’s proclamation that shifted the focus from searching for strings to understanding entities and relationships, the term encapsulates a network of interconnected entities and concepts, facilitating data integration, sharing, and utilization within organizations.
Amid the widespread adoption of knowledge graphs across diverse domains, ensuring the accuracy, reliability, and consensus of semantic information becomes an imperative. The construction and utilization of these graphs present multifaceted challenges, ranging from ensuring data quality to scaling and adapting to evolving contexts.
Implementing a successful Knowledge Graph initiative within an organization demands strategic decisions before and during its execution. Often overlooked are critical considerations such as managing trade-offs between knowledge quality and other factors, prioritizing knowledge evolution, and allocating resources effectively. Neglecting these facets can lead to friction and suboptimal outcomes.
This half-day seminar delves into the technical, business, and organizational dimensions essential for data practitioners and executives embarking on a Knowledge Graph initiative. Offering insights gleaned from real-world case studies, the seminar provides a comprehensive framework that combines cutting-edge techniques with pragmatic advice. It equips participants to navigate the complexities of executing a knowledge graph project successfully.
Moreover, the session addresses pivotal strategic dilemmas encountered during the design and execution phases of knowledge graph projects, and outlines potential approaches to tackle these challenges, empowering attendees with actionable strategies to optimize their initiatives.
Learning Objectives
Who is it for?
Course Outline
The seminar will walk participants through 8 key stages of introducing, developing, delivering and evolving Knowledge Graphs in an organization. These are:
Stage 1 – “Knowing where you are getting into”
Stage 2 – ”Setting up the stage”
Stage 3 – “Deciding what to build”:
Stage 4 – “Giving it a shape”
Stage 5 – “Giving it substance”
Stage 6 – “Ensuring it’s good”:
Stage 7 – “Making it useful”:
Stage 8 – “Making it last”:
Prefer online? Join the live video stream!
You can join us in Utrecht, The Netherlands or online. Delegates also gain four months access to the conference recordings so there’s no need to miss out on any session that we run in parallel.
Payment by credit card is also available. Please mention this in the Comment-field upon registration and find further instructions for credit card payment on our customer service page.