DATA Lab @ Northeastern

We build the algorithms and systems that power large-scale data management. Our recent projects include topics such as query understanding, data provenance, data discovery, ranking with joins, storage and indexing, randomized data structures, data management on modern hardware (GPUs), graph processing, query optimization, and managing data lakes and reasoning over complex, uncertain networks. Our work is interdisciplinary and we collaborate with scientists at Northeastern and database groups across the world.

History

The DATA Lab under its current name was created in 2017 when Prof. Wolfgang Gatterbauer moved to Northeastern University and joined forces with Prof. Mirek Riedewald, who had established a database research group there in 2009. ACM Fellow Prof. Renée Miller joined in 2018, followed by ACM Fellow and IEEE Fellow Prof. Ricardo Baeza-Yates in 2020. Prof. Miller moved to Waterloo in 2024, and Prof. Baeza-Yates to Barcelona in 2025. Prof. Prashant Pandey joined in 2025.

Regular Classes or Seminars

cs7240: Principles of scalable data management: theory, algorithms, and database systems (Gatterbauer): next time Spring 2026
cs7270/4973: Advanced Database Systems (Pandey): next time Fall 2025
cs7280/4973: Data Structures & Algorithms for Scalable Computing (Pandey): usually Spring semester
cs7480: Foundations and Applications of Information Theory (Gatterbauer): next time Fall 2025
cs6240: Parallel Data Processing in MapReduce (Riedewald)
cs3200: Database design (Gatterbauer)
DATA lab seminar

News

[July 2025] We will present at VLDB 2025 in London our GooseDB system for identifying optimal modifications to a top-k SQL query so that its output satisfies the desired ranking criteria. Also our new unified reverse data management algorithm: one algorithm that unifies a lot of prior work and recovers all PTIME solution: Is ILP all you need for Deletion Propagation?
[July 2025] Congratulations to our newly minted PhD Dr. Neha Makhija! All the best for your future career as a tenure-track Assistant Professor at UMass Amherst!
[June 2025] Congratulations Wolfgang for all the recognitions at SIGMOD/PODS: In addition to your ACM SIGMOD research highlight with Cody, distinguished AE at SIGMOD *and* also distinguished reviewer at PODS.
[May 2025] Congratulations Diandre for being awarded the NSF Graduate Research Fellowship!
[May 2025] We will present 3 papers at SIGMOD/PODS 2025: (1) Adaptive Quotient Filters, (2) Zombie Hashing, (3) Resilience for Regular Path Queries
[May 2025] We will present our latest research paper on synthesizing and explaining ranking functions at ICDE 2025 in Hongkong.
[Jan 2025] Our SIGMOD’24 Relational Patterns paper was selected as ACM SIGMOD research highlight
[Jan 2025] Hooray! Welcome Prashant Pandey to our group 🙂
[June 2024] Our SIGMOD’24 paper on Relational Diagrams and the Pattern Expressiveness of Relational Languages received an honorable mention.
[May 2024] Neha Makhija will present 2 papers at SIGMOD/PODS 2024 that show the power of polyhedral theory when applied to standard database problems: (1) A Unified Approach for Resilience and Causal Responsibility, (2) Minimally Factorizing the Provenance of Self-Join Free Conjunctive Queries. Wolfgang will discuss (3) Relational Diagrams and the Pattern Expressiveness of Relational Languages.
[June 2024] Congratulations to our newly minted PhD Dr. Nikos Tziavelis! All the best for your future career as a tenure-track Assistant Professor at UC Santa Cruz!
[May 2024] Congratulations Diego for being awarded the NSF Graduate Fellowship!
[August 2023] Agapi, Mirek, Neha, Nikos, and Wolfgang will spend 4 months at the Simons Institute at UC Berkeley to attend the program on Logic and Algorithms in Database Theory and AI

DATA Lab @ Northeastern Scalable Management and Analysis of Big Data

History

Regular Classes or Seminars

News