SQIREL Graph Database Systems

In this project, we are studying the design of graph database systems. The term 'graph' refers to its mathematical meaning: it involves data that takes the form of a network. While this is evident in social networks and telecommunication networks, graph analysis is also relevant for data in the form of tables (as connections between data elements can be formed through tables). The SQIREL project focuses on several specific aspects: efficient data structures for networks that change rapidly and continuously, the design of a query language for graph databases, and the integration of keyword queries, with the search function based on the network structure.

Graph database systems are becoming increasingly prominent. There are many crucial applications in areas such as security, logistics, and medical fraud detection, where detecting 'real-time' patterns in such graphs as soon as messages are posted is important for making immediate decisions.

In early 2023, we spoke with the project leader of SQIREL, Prof. Dr. Peter Boncz, a senior researcher at the Database Architectures research group of the CWI, and responsible for the Machine Learning, Database Architectures, and Human-centered Data Analytics research group. Peter is also a professor at the Vrije Universiteit Amsterdam in the special chair of Large-Scale Analytical Data Management. Peter is the architect of the database systems MonetDB and VectorWise (now: Actian Vector) and has been involved in five spin-off companies in the field of data management. Peter takes pride in Marcin Żukowski, one of the PhD candidates from CWI's Data Architectures group, as a co-founder of Snowflake (which achieved the largest IPO ever on Nasdaq in 2021) and bringing the competitor Databricks to the Netherlands, which invested over 100 million euros in the Amsterdam R&D branch in the past year.

Partners

The consortium consists of top academic researchers in the field of query processing and Information Retrieval (IR), along with two use-case partners and two technology partners: Neo Technology - the company behind the graph database Neo4j - and the aforementioned Databricks, which developed the popular open-source system Apache Spark and now offers it as a cloud service.

Neo4J is a market leader in graph database systems. The Radboud University is the second academic partner, led by Prof. Dr. Ir. Arjen de Vries. Arjen focuses on organizing nodes in the graph based on the linked content, such as keywords. He also works on creating an enriched graph by recognizing entities in the associated text, for example.

The use-case partners are WizeNoze and Spinque. Wizenoze uses the latest AI technology to build the largest global library of curated educational content and matches it with any curriculum. Spinque's technology answers millions of questions daily in domains such as e-commerce, government, enterprise search, and cultural heritage.

The Linked Data Benchmark Council (LDBC)

One component of SQIREL is leading the Linked Data Benchmark Council (LDBC, ldbcouncil.org). LDBC is a non-profit collaboration of research institutions and industry around graph processing technologies. LDBC consists of members from both industry and academia, including organizations and individuals. "Almost all graph database companies are members of LDBC, including Neo4j, but also Amazon and Intel. The group collaborates to determine what benchmarks for graph database systems should entail. This allows us to compare each other's technology to achieve better performance and maturity of the technology."

Within SQIREL, work has been done on both a 'business intelligence benchmark' (test for analytical graph queries) and a second version of an 'interactive benchmark.'

Two Languages

"All these graph databases currently speak a different language, which is naturally very challenging in benchmarks. Each system has its own query language, making it like comparing apples to oranges. It is in everyone's interest to come to a standard language. After our proposal 'G-CORE,' we have collaborated with ISO to develop two new languages as extensions to SQL."

The LDBC working groups in the project have worked on two graph query languages: the upcoming ISO GQL and SQL/PGQ languages, which will be released in June 2023 and March 2024, respectively. "We are proud that in the SQIREL project, we are improving a global ISO standard, namely the universally used SQL query language."

DuckDB

In SQIREL, work has been done on a first practical implementation of SQL/PGQ.

CWI has developed the database system DuckDB in recent years, which is becoming extremely popular, with over 2 million downloads per month. CWI spin-off company DuckDB Labs was founded in 2021 and then played a crucial role in the establishment of startup MotherDuck in 2022, aiming to connect DuckDB with the cloud. "The goal is for DuckDB users to be able to use and store graph data in DuckDB. In the lab, it looks promising, but it will be exciting to see if we can really make it by summer. The software still needs to be made usable to easily integrate SQL/PGQ into DuckDB, which will take about a year. We are also trying to make a deep integration with Graph Neural Network (GNN) packages."
A Graph Neural Network (GNN) is a class of artificial neural networks for processing data that can be represented as graphs.

Future of SQL/PGQ

SQIREL has concluded, and the postdoc and PhD candidate have completed their work. However, the research continues.

"There is still considerable work to be done, and I am very curious about the reception in the database market. There is potential, and many organizations can benefit from it; a spin-off could very well happen."

Peter's long-term mission is a thriving ecosystem around R&D database systems in the Netherlands. "Not only should education be provided about data systems, but there should also be research and industry designing these systems."

Gerelateerde projects

Bekijk hieronder de Projecten gerelateerd aan het thema SQIREL Graph Database Systems