The Data Lab @ Northeastern University is one of the leading research groups in data management and data systems. Our work spans the breadth of data management, from the foundations of data integration and curation, to large-scale and parallel data-centric computing. Recent research projects include query visualization, data provenance, data discovery, data lake management, and scalable approaches to perform inference over uncertain and networked data, Our work is interdisciplinary and we collaborate with scientists at Northeastern and database groups across the world. With Northeastern, we have a deep commitment to diversity and inclusion and its role in building communities and fostering learning and discovery. And we are growing!
Open Positions
Our College is growing with several positions in all areas including data management and data science, at all levels (assistant, associate, or full). For Faculty positions see College ads.
We are actively looking for new PhD students with strong background in data management, algorithms, theory, or systems. For details, please see our page on research opportunities.
Collaborations with Sciences and Industry
For more than 15 years, Prof. Mirek Riedewald has been collaborating with scientists from various domains. This includes summarization techniques for digital libraries, data mining and exploratory analysis in collaboration with the Cornell Lab of Ornithology, speeding up of high-dimensional simulations (for combustions), data and provenance management for astronomy and high-energy physics, and reconstruction, tracing, and connection analysis of massive collections of high-resolution brain images. We also developed new technology for pattern analysis with industrial partners.
If your research team or company has reached a point where data management and analysis has become a bottleneck, please contact us. We are excited to learn about real-world applications that will lead to opportunities for novel research, joint proposals for funding, or consulting. Example areas include Scientific applications, graph analysis, medical data, cloud computing.
Recent or Upcoming Courses
Spring 2021: cs7240: Principles of scalable data management: theory, algorithms, and database systems
Spring 2021: cs6240: Parallel Data Processing in MapReduce
Fall 2020: cs3200: Database design
News
- [Feb 2021] New preprint of Nikos’ work on ranked enumeration over theta join queries.
- [Jan 2021] Congratulations to Aristotelis and Laura for their paper on homograph detection in data lakes.
- [Dec 2020] New preprint of work on direct access to ranked answers of conjunctive queries.
- [Aug. 2020] Congratulations to Neha, Nikos, Laura and co-authors for their paper on meta data visualizations for data lakes to appear at VIS 2020.
- [July 2020] We are excited about our new NSF grant on data discovery and table alignment. Thank you NSF!
- [July 2020] Renée Miller (and her colleagues Ron Fagin, Phokion Kolaitis, Lucian Popa, and Wang-Chiew Tan) receive the 2020 Alonzo Church Award for Outstanding Contributions to Logic and Computation.
- [June 2020] Two more PVLDB 2020 papers accepted on Knowledge Translation and Table Discovery from CSV Files.
- [April 2020] Nikos’ exciting work on optimal ranked enumeration is accepted for VLDB 2020.
- [March 2020] Five papers accepted to SIGMOD/PODS 2020: One on organizing data lakes for navigation, one on near-optimal band-joins, one on query understanding through diagrammatic diagrams, one on algebraic amplification, and one on complexity results for resilience. In addition, we have a tutorial on any-k enumeration, and Miller will be leading a SIGMOD keynote session “Toward Exploring, Understanding, and Searching a Billion Data Sets” with colleagues Natasha Noy (Google) and Awez Syed (Informatica). Congratulations everyone!
- [Feb 2020] SIGMOD 2020 tutorial on tutorial on worst-case optimal joins meet top-k accepted.
- [Jan 2020] We will present our work on algebraic amplification and a few posters at NEDB Day’20
- [Jan. 2020] Miller to give Keynote at SIGMOD aiDM’20 Workshop on research challenges in data lake management.
- [Dec. 2019] SIGMOD 2020 paper on factorized graph representations for semi-supervised learning from sparse data accepted.
- [Nov. 2019] PODS 2020 with new complexity results of resilience accepted.
- [Nov. 2019] Wolfgang to give keynote at SUM on algebraic approximations for weighted model counting.
- [Apr. 2019] Come to our VLDB 2019 Tutorial on Data Lakes!
- [Apr. 2019] VLDB 2019 demonstration on image search using low-resolution traffic cameras accepted.
- [Mar. 2019] SIGMOD 2019 paper on anytime approximations for probabilistic inference accepted.
- [Jan. 2019] The DATA Lab will attend NEDB Day’19 at MIT and present a talk and multiple posters.
- [Jan. 2019] Welcome to our new post doc Laura! Sorry for the cold weather 🙁