Recent Publications

Latest publications plus extended material are also available from the faculty and/or project web pages.

2025

2024

2023

  • Why Not Yet: Fixing a Top-k Ranking that Is Not Fair to Individuals
    Zixuan Chen, Panagiotis Manolios, Mirek Riedewald.
    VLDB 2023 (to appear)
    Preprint version

  • Efficient Computation of Quantiles over Joins

    Nikolaos Tziavelis, Nofar Carmeli, Wolfgang Gatterbauer, Benny Kimelfeld, Mirek Riedewald
  • SANTOS: Relationship-based Semantic Table Union Search

    Aamod Khatiwada, Grace Fan, Zixuan Chen, Roee Shraga, Wolfgang Gatterbauer, Renée J. Miller, Mirek Riedewald
  • Data Lake Organization.
    Fatemeh Nargesian, Ken Q. Pu, Bahar Ghadiri Bashardoost, Erkang Zhu, Renée J. Miller.
    IEEE Trans. Knowl. Data Eng. 35(1): 237-250 (2023).
    pdf (arXiv:1812.07024)

  • Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning.
    Grace Fan, Jin Wang, Yuliang Li, Dan Zhang, Renée J. Miller.
    PVLDB 2023.
    pdf  | GitHub

  • Explaining Dataset Changes for Semantic Data Versioning with Explain-Da-V.
    Roee Shraga, Renée J. Miller.
    PVLDB 2023 (to appear).
    pdf | GitHub

  • FlexER: Flexible Entity Resolution for Multiple Intents.
    Bar Genossar, Roee Shraga, Avigdor Gal.
    SIGMOD 2023 (to appear).
    pdf (arXiv:2209.07569)  | GitHub

  • Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries

    Nofar Carmeli, Nikolaos Tziavelis, Wolfgang Gatterbauer, Benny Kimelfeld, Mirek Riedewald
    Invited version from best of PODS 2021
  • Understanding Search Behavior Bias in Wikipedia.
    Bruno Scarone, Ricardo Baeza-Yates, Erik Bernhardson.
    Bias@ECIR 2023 (to appear).

  • Human-Centered Responsible Artificial Intelligence: Current & Future Trends.
    Mohammad Tahaei, Marios Constantinides, Daniele Quercia, Sean Kennedy, Michael J. Muller, Simone Stumpf, Q. Vera Liao, Ricardo Baeza-Yates, Lora Aroyo, Jess Holbrook, Ewa Luger, Michael Madaio, Ilana Golbin Blumenfeld, Maria De-Arteaga, Jessica Vitak, Alexandra Olteanu.
    pdf (arXiv:2302.08157)

  • Distance and Time Sensitive Filters for Similarity Search in Trajectory Datasets
    Madhav Narayan Bhat, Paul Cesaretti, Mayank Goswami, Prashant Pandey
    APOCS 2023
    pdf

  • Communication Optimization for Distributed Execution of Graph Neural Networks
    Süreyya Emre Kurt, Jinghua Yan, Aravind Sukumaran-Rajam, Prashant Pandey, P. Sadayappan
    IPDPS 2023
    pdf

  • High-Performance Filters for GPUs
    Hunter McCoy, Steven Hofmeyr, Katherine Yelick, Prashant Pandey
    PPoPP 2023
    pdf

  • Singleton Sieving: Overcoming the Memory/Speed Trade-Off in Exascale k-mer Analysis
    Hunter McCoy, Steven Hofmeyr, Katherine Yelick, Prashant Pandey
    ACDA 2023
    pdf

  • IcebergHT: High Performance Hash Tables Through Stability and Low Associativity
    Prashant Pandey, Michael Bender, Alex Conway, Martin Farach-Colton, William Kuszmaul, Guido Tagliavini, Rob Johnson
    SIGMOD 2023
    pdf

  • BP-tree: Overcoming the Point-Range Operation Tradeoff for In-Memory B-trees
    Helen Xu, Amanda Li, Brian Wheatman, Manoj Marneni, Prashant Pandey
    VLDB 2023
    pdf

  • DomainNet: Homograph Detection and Understanding in Data Lake Disambiguation

    Aristotelis Leventidis, Laura Di Rocco, Wolfgang Gatterbauer, Renée J. Miller, Mirek Riedewald
    Invited version from EDBT 2021 best paper award
  • A Tutorial on Visual Representations of Relational Queries

    Wolfgang Gatterbauer
  • Unsupervised learning of 3-colorings using simplicial higher-order neural networks

    Lucase Laird, Robin Walters, Wolfgang Gatterbauer

2022

  • Integrating Data Lake Tables

    Aamod Khatiwada, Roee Shraga, Wolfgang Gatterbauer, Renée J. Miller
  • Knowledge-Based News Event Analysis Toolkit.
    Oktie Hassanzadeh, Parul Awasthy, Ken Barker, Onkar Bhardwaj, Debarun Bhattacharjya, Mark Feblowitz, Aamod Khatiwada, Lee Martie, Steve Fonin Mbouadeu, Jian Ni, Anik Saha, Sola Shirai, Kavitha Srinivas and Lucy Yip.
    ISWC 2022.
    pdf | Homepage

  • Knowledge Graph Embeddings for Causal Relation Prediction.
    Aamod Khatiwada, Sola Shirai, Kavitha Srinivas, Oktie Hassanzadeh.
    DL4KG@ISWC 2022.
    pdf | datasets

  • Rule-Based Link Prediction over Event-Related Causal Knowledge in Wikidata.
    Sola Shirai, Aamod Khatiwada, Debarun Bhattacharjya, Oktie Hassanzadeh.
    Wikidata@ISWC 2022.
    pdf

  • HumanAL: calibrating human matching beyond a single task.
    Roee Shraga.
    HILDA@SIGMOD 2022.
    pdf (ACM) | pdf (arXiv:2205.03209) | video (15 min)

  • BiaScope: Visual Unfairness Diagnosis for Graph Embeddings.
    Agapi Rissaki, Bruno Scarone, David Liu, Aditeya Pandey, Brennan Klein, Tina Eliassi-Rad, Michelle A. Borkin.
    Symposium on Visualization in Data Science at IEEE VIS 2022.
    pdf (arXiv:2210.06417) | GitHub

  • Principles of Query Visualization

    Wolfgang Gatterbauer, Cody Dunne, H.V. Jagadish, Mirek Riedewald
  • Interpreting and understanding relational database queries using diagrams.
    Wolfgang Gatterbauer.
    Tutorial @ DIAGRAMS, 2022.
    Project page: QueryVis

  • Toward Responsive DBMS: Optimal Join Algorithms, Enumeration, Factorization, Ranking, and Dynamic Programming

    Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald
  • Algebraic Approximations of the Probability of Monotone Boolean Functions.
    Wolfgang Gatterbauer
    ISAIM 2022.
    pdf

  • Meike Zehlike, Tom Sühr, Ricardo Baeza-Yates, Francesco Bonchi, Carlos Castillo, Sara Hajian.
    Fair Top-k Ranking with multiple protected groups.
    Inf. Process. Manag. 59(1): 102707 (2022)

  • Ricardo Baeza-Yates, Marina Estévez-Almenzar:
    The Relevance of Non-Human Errors in Machine Learning.
    EBeM@IJCAI 2022

  • Eduardo Graells-Garrido, Ricardo Baeza-Yates.
    Bots don’t Vote, but They Surely Bother!: A Study of Anomalous Accounts in a National Referendum. WebSci 2022: 302-306

  • Ricardo Baeza-Yates, Usama M. Fayyad.
    The Attention Economy and the Impact of Artificial Intelligence. Perspectives on Digital Humanism 2022: 123-134

  • An Incrementally Updatable and Scalable System for Large-Scale Sequence Search using the Bentley-Saxe Transformation
    Fatemeh Almodaresi, Jamshed Khan, Sergey Madaminov, Michael Ferdman, Rob Johnson, Prashant Pandey, Rob Patro
    BIOINFORMATICS 2022
    pdf

2021

2020

2019

  • JOSIE:  Overlap set similarity search for finding joinable tables in data lakes
    Erkang Zhu, Dong Deng, Fatemeh Nargesian, Renée J. Miller
    SIGMOD, pp. 847-864, 2019.
    pdf

  • Anytime approximation in probabilistic databases via scaled dissociations
    Maarten Van den Heuvel, Peter Ivanov, Wolfgang Gatterbauer, Floris Geerts, Martin Theobald
    SIGMOD, pp. 1295-1312, 2019.
    pdf | preprint | bib

  • VISE: Vehicle Image Search Engine with traffic camera
    Hyewon Choi, Erkang Zhu, Arsala Bangash, Renée J. Miller
    PVLDB 12(12): 1842-1845 (2019).
    pdf

  • Data lake management: Challenges and opportunities (tutorial)
    Fatemeh Nargesian, Erkang Zhu, Renée J. Miller, Ken Q. Pu, Patricia C. Arocena
    PVLDB 12(12): 1986-1989 (2019).
    pdf

  • Bridging quantities in tables and text
    Yusra Ibrahim, Mirek Riedewald, Gerhard Weikum, Demetrios Zeinalipour-Yatzi
    ICDE, pp. 1010-1021, 2019.
    pdf

  • A collective, probabilistic approach to schema mapping using diverse noisy evidence
    Angelika Kimmig, Alex Memory, Renée J. Miller, Lise Getoor
    IEEE Trans. Knowl. Data Eng. 31(8): 1426-1439 (2019).
    pdf

  • Abstract cost models for distributed data-intensive computations
    Rundong Li, Ningfang Mi, Mirek Riedewald, Yizhou Sun, Yi Yao
    DAPD Journal, 37(3): 411-439, 2019.
    pdf | bib

  • Algebraic approximations of the probability of Boolean functions
    Wolfgang Gatterbauer
    SUM (Invited Keynote), pp. 449-450, 2019.

  • An Efficient, Scalable and Exact Representation of High-Dimensional Color Information Enabled via de Bruijn Graph Search Problem
    Fatemeh Almodaresi, Prashant Pandey, Michael Ferdman, Rob Johnson, Rob Patro
    RECOMB 2019, JCB 2020
    pdf

  • Locality Sensitive Hashing for the Edit Distance
    Guillaume Marcais, Dan DeBlasio, Prashant Pandey, Carl Kingsford
    ISMB 2019, BIOINFORMATICS 2019
    pdf

     

  • Small Refinements to the DAM Can Have Big Consequences for Data-Structure Design
    Michael Bender, Alex Conway, Martin Farach-Colton, William Jannen, Yizheng Jiao, Rob Johnson, Eric Knorr, Sara McAllister, Nirjhar Mukherjee, Prashant Pandey, Donald E. Porter, Jun Yuan, Yang Zhan
    SPAA 2019
    pdf

     

2018

2017

  • Beta probabilistic databases: A scalable approach to belief updating and parameter learning
    Niccolò Meneghetti, Oliver Kennedy, Wolfgang Gatterbauer
    SIGMOD, pp. 573-586, 2017. (Invited to the Special Issue of TODS on “best of SIGMOD 2017“)
    pdf | preprint | bib

  • Automated template generation for question answering over knowledge graphs
    Abdalghani Abujabal, Mohamed Yahya, Mirek Riedewald, Gerhard Weikum
    WWW, pages 1191-1200, 2017.
    pdf

  • Interactive navigation of open data linkages
    Erkang Zhu, Ken Q. Pu, Fatemeh Nargesian, Renée J. Miller
    PVLDB 10(12): 1837-1840 (2017).
    pdf

  • A collective, probabilistic approach to schema mapping
    Angelika Kimmig, Alex Memory, Renée J. Miller, Lise Getoor
    ICDE, pp. 921-932, 2017.
    pdf | arXiv:1702.03447 | preprint

  • The linearization of belief propagation on pairwise Markov random fields
    Wolfgang Gatterbauer
    AAAI, pp. 3747-3753, 2017.
    pdf | arXiv:1502.04956 (long) | bib

  • DeepSea: Progressive workload-aware partitioning of materialized views in scalable data analytics
    Jiang Du, Renée J. Miller, Boris Glavic, Wei Tan
    EDBT, pp. 198-209, 2017.
    pdf

  • Conflict of interest declaration and detection system in heterogeneous networks
    Siyuan Wu, Leong Hou U, Sourav S Bhowmick, Wolfgang Gatterbauer
    CIKM, pp. 2383-2386, 2017 (Short paper).
    pdf | preprint | bib

  • A case for abstract cost models for distributed execution of analytics operators
    Rundong Li, Ningfang Mi, Mirek Riedewald, Yizhou Sun, Yi Yao
    In Proc. Int. Conf. on Big Data Analytics and Knowledge Discovery (DaWaK), pages 149-163, 2017.
    pdf | preprint

  • Algorithms for automatic ranking of participants and tasks in an anonymized contest
    Yang Jiao, R. Ravi, Wolfgang Gatterbauer
    WALCOM, pp 335-346, 2017. (Invited to the Special Issue of Elsevier TCS on “best of WALCOM 2017“)
    pdf | arXiv:1612.04794 | bib

  • VIQS: Visual interactive exploration of query semantics
    Christina Christodoulakis, Eser Kandogan, Ignacio G. Terrizzano, Renée J. Miller
    ESIDA@IUI, pp. 25-32, 2017.
    pdf

  • Dissociation and propagation for approximate lifted inference with standard relational database management systems
    Wolfgang Gatterbauer, Dan Suciu
    VLDBJ (Special Issue of VLDB Journal on “best of VLDB 2015“). pp. 5-30, 2016.
    pdf | arXiv:1310.6257 (long)

  • Data quality: The role of empiricism
    Shazia Wasim Sadiq, Tamraparni Dasu, Xin Luna Dong, Juliana Freire, Ihab F. Ilyas, Sebastian Link, Renée J. Miller, Felix Naumann, Xiaofang Zhou, Divesh Srivastava
    SIGMOD Record 46(4): 35-43 (2017)
    pdf

  • The future of data integration (Keynote Abstract)
    Renée J. Miller
    KDD, p. 3, 2017.
    pdf

  • A machine learning approach for result caching in web search engines
    Tayfun Kucukyilmaz, Berkant Barla Cambazoglu, Cevdet Aykanat, Ricardo Baeza-Yates
    Inf. Process. Manag. 53(4): 834-850 (2017)

  • Story-focused reading in online news and its potential for user engagement
    Janette Lehmann, Carlos Castillo, Mounia Lalmas, Ricardo Baeza-Yates
    J. Assoc. Inf. Sci. Technol. 68(4): 869-883 (2017)

  • Quality-efficiency trade-offs in machine learning for text processing
    Ricardo Baeza-Yates, Zeinab Liaghat:
    BigData 2017: 897-904

  • FA*IR: A Fair Top-k Ranking Algorithm
    Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, Ricardo Baeza-Yates:
    CIKM 2017: 1569-1578

  • Detection of Trending Topic Communities: Bridging Content Creators and Distributors
    Lorena Recalde, David F. Nettleton, Ricardo Baeza-Yates, Ludovico Boratto:
    HT 2017: 205-213

  • Exploring Query Auto-Completion and Click Logs for Contextual-Aware Web Search and Query Suggestion
    Liangda Li, Hongbo Deng, Anlei Dong, Yi Chang, Ricardo Baeza-Yates, Hongyuan Zha
    WWW 2017: 539-548

  • A General-Purpose Counting Filter: Making Every Bit Count
    Prashant Pandey, Michael A. Bender, Rob Johnson, and Rob Patro
    SIGMOD 2017
    pdf

  • deBGR: An Efficient and Near-Exact Representation of the Weighted de Bruijn Graph
    Prashant Pandey, Michael A. Bender, Rob Johnson, and Rob Patro
    ISMB/BIOINFORMATICS 2017
    pdf

  • Rainbowfish: A Succinct Colored de Bruijn Graph Representation
    Fatemeh Almodaresi, Prashant Pandey, and Rob Patro
    WABI 2017
    pdf

2016

  • Merlin: Exploratory Analysis with Imprecise Queries
    Bahar Qarabaqi, Mirek Riedewald
    IEEE TKDE, 28(2): 342-355, 2016. (TKDE special issue on “best of ICDE 2014“)
    pdf

  • Making sense of entities and quantities in web tables
    Yusra Ibrahim, Mirek Riedewald, Gerhard Weikum
    CIKM, pp. 1703-1712, 2016.
    pdf | preprint

  • Visual congruent ads for image search
    Yannis Kalantidis, Ayman Farahat, Lyndon Kennedy, Ricardo Baeza-Yates, David A. Shamma
    ICPR 2016: 1496-1505

  • Encouraging Diversity- and Representation-Awareness in Geographically Centralized Content
    Eduardo Graells-Garrido, Mounia Lalmas, Ricardo Baeza-Yates:
    IUI 2016: 7-18

  • Data Portraits and Intermediary Topics: Encouraging Exploration of Politically Diverse Profiles
    Eduardo Graells-Garrido, Mounia Lalmas, Ricardo Baeza-Yates:
    IUI 2016: 228-240

  • Scalable Semantic Matching of Queries to Ads in Sponsored Search Advertising
    Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, Fabrizio Silvestri, Ricardo Baeza-Yates, Andrew Feng, Erik Ordentlich, Lee Yang, Gavin Owens
    SIGIR 2016: 375-384

  • Towards Mobile Query Auto-Completion: An Efficient Mobile Application-Aware Approach
    Aston Zhang, Amit Goyal, Ricardo Baeza-Yates, Yi Chang, Jiawei Han, Carl A. Gunter, Hongbo Deng
    WWW 2016: 579-590

  • Optimizing Every Operation in a Write-Optimized File System
    Jun Yuan, Yang Zhan, William Jannen, Prashant Pandey, Amogh Akshintala, Kanchan Chandnani, Pooja Deo, Zardosht Kasheff, Michael Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter
    FAST 2016
    pdf

2015

2014