Data lab @ Northeastern

We build the algorithms and systems that power large-scale data management. Our recent projects include topics such as query understanding, data provenance, data discovery, ranking with joins, storage and indexing, randomized data structures, data management on modern hardware (GPUs), graph processing, query optimization, and managing data lakes and reasoning over complex, uncertain networks. Our work is interdisciplinary and we collaborate with scientists at Northeastern and database groups across the world.

History

The DATA Lab under its current name was created in 2017 when Prof. Wolfgang Gatterbauer moved to Northeastern University and joined forces with Prof. Mirek Riedewald, who had established a database research group there in 2009. ACM Fellow Prof. Renée Miller joined in 2018, followed by ACM Fellow and IEEE Fellow Prof. Ricardo Baeza-Yates in 2020. Prof. Miller moved to Waterloo in 2024, and Prof. Baeza-Yates to Barcelona in 2025. Prof. Prashant Pandey joined in 2025.

Regular Classes or Seminars

cs7240: Principles of scalable data management: theory, algorithms, and database systems (Gatterbauer): next time Spring 2026
cs7280/4973: Advanced Database Systems (Pandey): next time Fall 2025
cs7280/4973: Data Structures & Algorithms for Scalable Computing (Pandey): usually Spring semester 
cs7480: Foundations and Applications of Information Theory (Gatterbauer): next time Fall 2025
cs6240: Parallel Data Processing in MapReduce (Riedewald)
cs3200: Database design (Gatterbauer)
DATA lab seminar 

News