Data lab @ Northeastern

The Data Lab @ Northeastern University is a team of faculty and students who explore a range of research problems in scalable data management and analysis. Our work ranges from fundamental questions on the complexity of data management problems to practical applications with domain scientists and covers areas such as large-scale and parallel data analysis algorithms, graph data management, and uncertain data. We participate in a number of interdisciplinary research projects and collaborate with other faculty at Northeastern and database groups across the world. And we are growing!

Open Positions

Our College is growing with more than four open positions in all areas including data management and data science, at all levels (assistant, associate, or full).  For Faculty positions see College ads.  

We are actively looking for new PhD students with strong background in data management, algorithms, theory, or systems. For details, please see our page on research opportunities.

Collaborations with Sciences and Industry

For more than 15 years, Prof. Mirek Riedewald has been collaborating with scientists from various domains. This includes summarization techniques for digital libraries, data mining and exploratory analysis in collaboration with the Cornell Lab of Ornithology, speeding up of high-dimensional simulations (for combustions), data and provenance management for astronomy and high-energy physics, and reconstruction, tracing, and connection analysis of massive collections of high-resolution brain images. We also developed new technology for pattern analysis with industrial partners.

If your research team or company has reached a point where data management and analysis has become a bottleneck, please contact us. We are excited to learn about real-world applications that will lead to opportunities for novel research, joint proposals for funding, or consulting. Example areas include Scientific applications, graph analysis, medical data, cloud computing.

Recent or Upcoming Courses

Spring 2020: cs7240: Principled of scalable data management: theory, algorithms, and database systems
Spring 2019: 7280: Special interests: Principles of scalable data management
Spring 2018/Fall 2017: cs6240: Parallel Data Processing in MapReduce
Spring 2018/Fall 2017: cs3200: Database design
Fall 2017: cs7290: Special topics: Foundations in scalable data management


  • [Feb. 2020] Our tutorial proposal on worst-case optimal joins meet top-k was accepted to SIGMOD’20
  • [Dec. 2019] Our paper on factorized graph representations for semi-supervised learning from sparse data was accepted to SIGMOD’20
  • [Nov. 2019] Our paper with new complexity results of resilience was accepted to PODS’20
  • [Mar. 2019] Our paper on anytime approximations for probabilistic inference was accepted to SIGMOD’19
  • [Jan. 2019] The DATA Lab will attend NEDB Day’19 at MIT and present a talk and multiple posters
  • [Jan. 2019] Welcome to our new post doc Laura! Sorry for the cold weather 🙁
  • [Dec. 2018] Our paper on extracting information from web tables was accepted to ICDE’19
  • [Nov. 2018] Our paper on finding joinable tables was accepted to SIGMOD’19
  • [Sept. 2018] Our paper on parameter learning in probabilistic databases was chosen among best of SIGMOD’17 and invited to TODS
  • [Sept. 2018] Welcome to our new PhD students: Peter Ivanov, Aristotelis Leventidis, and Nikos Tziavelis
  • [July 2018] Our paper on automatic ranking of anonymous participants was chosen among best of WALCOM’17 and invited to TCS (Theoretical Computer Science)
  • [June 2018] We are welcoming Prof. Renee Miller to Northeastern University! Congratulations to your VDLB’18 keynote and your VLDB Women in Database Research Award for ground-breaking contributions on data integration and data curation