The Data Lab @ Northeastern University is one of the leading research groups in data management and data systems. We study the principles of algorithms and systems that scale to large amounts of data, data understanding, data organization, and data discovery.
Our work has impacted the foundations of data integration and curation, as well as large-scale and parallel data-centric computing. We have studied open data, including open government data, and developed methods to make open data more usable and accessible. Recent research projects include query visualization, data provenance, data discovery, data lake management, and scalable approaches to perform inference over uncertain and networked data. Our work is interdisciplinary and we collaborate with scientists at Northeastern and database groups across the world. With Northeastern, we have a deep commitment to diversity and inclusion and its role in building communities and fostering learning and discovery. And we are growing!
History
The DATA Lab under its current name was created in 2017 when Prof. Gatterbauer moved to Northeastern University and joined forces with Prof. Riedewald, who had established a database research group there in 2009. ACM Fellow Prof. Miller joined in 2018, followed by ACM Fellow and IEEE Fellow Prof. Baeza-Yates in 2020. While enjoying a rapid growth in size and reputation, the lab and its predecessor have always made sure to offer a welcoming and intellectually stimulating environment for a diverse and talented group of PhD students and postdocs.
Open Positions
Our College is growing with several positions in all areas including data management and data science, at all levels (assistant, associate, or full). For Faculty positions see College ads.
We are actively looking for new PhD students with strong background in data management, algorithms, theory, or systems. For details, please see our page on research opportunities.
Collaborations with Sciences and Industry
For more than 15 years, Prof. Mirek Riedewald has been collaborating with scientists from various domains. This includes summarization techniques for digital libraries, data mining and exploratory analysis in collaboration with the Cornell Lab of Ornithology, speeding up of high-dimensional simulations (for combustions), data and provenance management for astronomy and high-energy physics, and reconstruction, tracing, and connection analysis of massive collections of high-resolution brain images. We also developed new technology for pattern analysis with industrial partners.
If your research team or company has reached a point where data management and analysis has become a bottleneck, please contact us. We are excited to learn about real-world applications that will lead to opportunities for novel research, joint proposals for funding, or consulting. Example areas include Scientific applications, graph analysis, medical data, cloud computing.
Regular Classes or Seminars
cs7240: Principles of scalable data management: theory, algorithms, and database systems (Gatterbauer)
cs6240: Parallel Data Processing in MapReduce (Riedewald)
cs3200: Database design (Gatterbauer/Miller)
DATA lab seminar (Gatterbauer/Riedewald)
News
- [September 2023] Welcome to the DATA Lab, Diego! Diego joins the lab from Puerto Rico as a PhD student.
- [August 2023] Agapi, Mirek, Neha, Nikos, and Wolfgang will spend 4 months at the Simons Institute at UC Berkeley to attend the program on Logic and Algorithms in Database Theory and AI
- [May 2023] We will present a fourth paper at VLDB 2023: (4) Why Not Yet: Fixing a Top-k Ranking that Is Not Fair to Individuals.
- [March 2023] Nikos will present his new concept of “quantile queries” at PODS 2023.
- [Feb. 2023] We will present three papers at VLDB 2023: (1) Integrating Data Lake Tables. (2) Explaining Dataset Changes for Semantic Data Versioning with Explain-Da-V. (3) Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning.
- [Feb. 2023] We will present two papers at SIGMOD 2023: (1) SANTOS: Relationship-based Semantic Table Union Search (2) FlexER: Flexible Entity Resolution for Multiple Intents. In addition, we have a tutorial titled “Table Discovery in Data Lakes: State-of-the-art and Future Directions”. Aamod and Roee are also presenting their demo paper, DIALITE: Discover, Align and Integrate Open Data Tables at SIGMOD 2023.
- [Jan. 2023] We are hosting the Northeast Database day this year at Northeastern on March 10 at ISEC.
- [Jan. 2023] Neha will return to Relational AI, Aamod will return to IBM Research, and Grace is going to Microsoft Research for summer internships.
- [Sep. 2022] Nikos has received a Google PhD fellowship (Structured Data and Database Management). What a wonderful acknowledgement of him as a top database researcher!