Recent Publications

Latest publications plus extended material are also available from the faculty and/or project web pages.


  • Why Not Yet: Fixing a Top-k Ranking that Is Not Fair to Individuals
    Zixuan Chen, Panagiotis Manolios, Mirek Riedewald.
    VLDB 2023 (to appear)
    Preprint version

  • Efficient Computation of Quantiles over Joins.
    Nikolaos Tziavelis, Nofar Carmeli, Wolfgang Gatterbauer, Benny Kimelfeld, Mirek Riedewald.
    PODS 2023 (to appear)

  • SANTOS: Relationship-based Semantic Table Union Search.
    Aamod Khatiwada, Grace Fan, Roee Shraga, Zixuan Chen, Wolfgang Gatterbauer, Renée J. Miller, Mirek Riedewald.
    SIGMOD 2023 (to appear).
    pdf (arXiv: 2209.13589) | GitHub

  • Data Lake Organization.
    Fatemeh Nargesian, Ken Q. Pu, Bahar Ghadiri Bashardoost, Erkang Zhu, Renée J. Miller.
    IEEE Trans. Knowl. Data Eng. 35(1): 237-250 (2023).
    pdf (arXiv:1812.07024)

  • Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning.
    Grace Fan, Jin Wang, Yuliang Li, Dan Zhang, Renée J. Miller.
    PVLDB 2023.
    pdf  | GitHub

  • Explaining Dataset Changes for Semantic Data Versioning with Explain-Da-V.
    Roee Shraga, Renée J. Miller.
    PVLDB 2023 (to appear).
    pdf | GitHub

  • FlexER: Flexible Entity Resolution for Multiple Intents.
    Bar Genossar, Roee Shraga, Avigdor Gal.
    SIGMOD 2023 (to appear).
    pdf (arXiv:2209.07569)  | GitHub

  • Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries.
    Nofar Carmeli, Nikolaos Tziavelis, Wolfgang Gatterbauer, Benny Kimelfeld, Mirek Riedewald.
    TODS 2023 (to appear).
    pdf (ACM) | pdf (arXiv:2012.11965)
    Project page: Any-k

  • Understanding Search Behavior Bias in Wikipedia.
    Bruno Scarone, Ricardo Baeza-Yates, Erik Bernhardson.
    Bias@ECIR 2023 (to appear).

  • Human-Centered Responsible Artificial Intelligence: Current & Future Trends.
    Mohammad Tahaei, Marios Constantinides, Daniele Quercia, Sean Kennedy, Michael J. Muller, Simone Stumpf, Q. Vera Liao, Ricardo Baeza-Yates, Lora Aroyo, Jess Holbrook, Ewa Luger, Michael Madaio, Ilana Golbin Blumenfeld, Maria De-Arteaga, Jessica Vitak, Alexandra Olteanu.
    pdf (arXiv:2302.08157)


  • Integrating Data Lake Tables.
    Aamod Khatiwada, Roee Shraga, Wolfgang Gatterbauer, Renée J. Miller.
    PVLDB 16(4): 932-945 (2022).
    pdf | pdf (ACM) | GitHub

  • Knowledge-Based News Event Analysis Toolkit.
    Oktie Hassanzadeh, Parul Awasthy, Ken Barker, Onkar Bhardwaj, Debarun Bhattacharjya, Mark Feblowitz, Aamod Khatiwada, Lee Martie, Steve Fonin Mbouadeu, Jian Ni, Anik Saha, Sola Shirai, Kavitha Srinivas and Lucy Yip.
    ISWC 2022.
    pdf | Homepage

  • Knowledge Graph Embeddings for Causal Relation Prediction.
    Aamod Khatiwada, Sola Shirai, Kavitha Srinivas, Oktie Hassanzadeh.
    DL4KG@ISWC 2022.
    pdf | datasets

  • Rule-Based Link Prediction over Event-Related Causal Knowledge in Wikidata.
    Sola Shirai, Aamod Khatiwada, Debarun Bhattacharjya, Oktie Hassanzadeh.
    Wikidata@ISWC 2022.

  • HumanAL: calibrating human matching beyond a single task.
    Roee Shraga.
    HILDA@SIGMOD 2022.
    pdf (ACM) | pdf (arXiv:2205.03209) | video (15 min)

  • BiaScope: Visual Unfairness Diagnosis for Graph Embeddings.
    Agapi Rissaki, Bruno Scarone, David Liu, Aditeya Pandey, Brennan Klein, Tina Eliassi-Rad, Michelle A. Borkin.
    Symposium on Visualization in Data Science at IEEE VIS 2022.
    pdf (arXiv:2210.06417) | GitHub

  • A Unified Approach for Resilience and Causal Responsibility with Integer Linear Programming (ILP) and LP Relaxations.
    Neha Makhija, Wolfgang Gatterbauer.
    pdf (arXiv:2212.08898)

  • Principles of Query Visualization.
    Wolfgang Gatterbauer, Cody Dunne, H.V. Jagadish, Mirek Riedewald.
    IEEE Data Engineering Bulletin, Special Issue on Widening the Impact of Data Engineering through Innovations in Education, Interfaces, and Features, 45(3):47-67, 2022.
    pdf | Project page: QueryVis

  • Relational Diagrams: a pattern-preserving diagrammatic representation of non-disjunctive Relational Queries.
    Wolfgang Gatterbauer, Cody Dunne, Mirek Riedewald.
    pdf (arXiv:2203.07284)
    Project page: QueryVis

  • Interpreting and understanding relational database queries using diagrams.
    Wolfgang Gatterbauer.
    Tutorial @ DIAGRAMS, 2022.
    Project page: QueryVis

  • Any-k Algorithms for Enumerating Ranked Answers to Conjunctive Queries.
    Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald.
    pdf (arXiv:2205.05649)
    Project page: Any-k

  • Toward Responsive DBMS: Optimal Join Algorithms, Enumeration, Factorization, Ranking, and Dynamic Programming.
    Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald.
    ICDE 2022 (Tutorial).
    pdf | Tutorial Webpage

  • Algebraic Approximations of the Probability of Monotone Boolean Functions.
    Wolfgang Gatterbauer
    ISAIM 2022.

  • Meike Zehlike, Tom Sühr, Ricardo Baeza-Yates, Francesco Bonchi, Carlos Castillo, Sara Hajian.
    Fair Top-k Ranking with multiple protected groups.
    Inf. Process. Manag. 59(1): 102707 (2022)

  • Ricardo Baeza-Yates, Marina Estévez-Almenzar:
    The Relevance of Non-Human Errors in Machine Learning.
    EBeM@IJCAI 2022

  • Eduardo Graells-Garrido, Ricardo Baeza-Yates.
    Bots don’t Vote, but They Surely Bother!: A Study of Anomalous Accounts in a National Referendum. WebSci 2022: 302-306

  • Ricardo Baeza-Yates, Usama M. Fayyad.
    The Attention Economy and the Impact of Artificial Intelligence. Perspectives on Digital Humanism 2022: 123-134




  • JOSIE:  Overlap set similarity search for finding joinable tables in data lakes
    Erkang Zhu, Dong Deng, Fatemeh Nargesian, Renée J. Miller
    SIGMOD, pp. 847-864, 2019.

  • Anytime approximation in probabilistic databases via scaled dissociations
    Maarten Van den Heuvel, Peter Ivanov, Wolfgang Gatterbauer, Floris Geerts, Martin Theobald
    SIGMOD, pp. 1295-1312, 2019.
    pdf | preprint | bib

  • VISE: Vehicle Image Search Engine with traffic camera
    Hyewon Choi, Erkang Zhu, Arsala Bangash, Renée J. Miller
    PVLDB 12(12): 1842-1845 (2019).

  • Data lake management: Challenges and opportunities (tutorial)
    Fatemeh Nargesian, Erkang Zhu, Renée J. Miller, Ken Q. Pu, Patricia C. Arocena
    PVLDB 12(12): 1986-1989 (2019).

  • Bridging quantities in tables and text
    Yusra Ibrahim, Mirek Riedewald, Gerhard Weikum, Demetrios Zeinalipour-Yatzi
    ICDE, pp. 1010-1021, 2019.

  • A collective, probabilistic approach to schema mapping using diverse noisy evidence
    Angelika Kimmig, Alex Memory, Renée J. Miller, Lise Getoor
    IEEE Trans. Knowl. Data Eng. 31(8): 1426-1439 (2019).

  • Abstract cost models for distributed data-intensive computations
    Rundong Li, Ningfang Mi, Mirek Riedewald, Yizhou Sun, Yi Yao
    DAPD Journal, 37(3): 411-439, 2019.
    pdf | bib

  • Algebraic approximations of the probability of Boolean functions
    Wolfgang Gatterbauer
    SUM (Invited Keynote), pp. 449-450, 2019.



  • Beta probabilistic databases: A scalable approach to belief updating and parameter learning
    Niccolò Meneghetti, Oliver Kennedy, Wolfgang Gatterbauer
    SIGMOD, pp. 573-586, 2017. (Invited to the Special Issue of TODS on “best of SIGMOD 2017“)
    pdf | preprint | bib

  • Automated template generation for question answering over knowledge graphs
    Abdalghani Abujabal, Mohamed Yahya, Mirek Riedewald, Gerhard Weikum
    WWW, pages 1191-1200, 2017.

  • Interactive navigation of open data linkages
    Erkang Zhu, Ken Q. Pu, Fatemeh Nargesian, Renée J. Miller
    PVLDB 10(12): 1837-1840 (2017).

  • A collective, probabilistic approach to schema mapping
    Angelika Kimmig, Alex Memory, Renée J. Miller, Lise Getoor
    ICDE, pp. 921-932, 2017.
    pdf | arXiv:1702.03447 | preprint

  • The linearization of belief propagation on pairwise Markov random fields
    Wolfgang Gatterbauer
    AAAI, pp. 3747-3753, 2017.
    pdf | arXiv:1502.04956 (long) | bib

  • DeepSea: Progressive workload-aware partitioning of materialized views in scalable data analytics
    Jiang Du, Renée J. Miller, Boris Glavic, Wei Tan
    EDBT, pp. 198-209, 2017.

  • Conflict of interest declaration and detection system in heterogeneous networks
    Siyuan Wu, Leong Hou U, Sourav S Bhowmick, Wolfgang Gatterbauer
    CIKM, pp. 2383-2386, 2017 (Short paper).
    pdf | preprint | bib

  • A case for abstract cost models for distributed execution of analytics operators
    Rundong Li, Ningfang Mi, Mirek Riedewald, Yizhou Sun, Yi Yao
    In Proc. Int. Conf. on Big Data Analytics and Knowledge Discovery (DaWaK), pages 149-163, 2017.
    pdf | preprint

  • Algorithms for automatic ranking of participants and tasks in an anonymized contest
    Yang Jiao, R. Ravi, Wolfgang Gatterbauer
    WALCOM, pp 335-346, 2017. (Invited to the Special Issue of Elsevier TCS on “best of WALCOM 2017“)
    pdf | arXiv:1612.04794 | bib

  • VIQS: Visual interactive exploration of query semantics
    Christina Christodoulakis, Eser Kandogan, Ignacio G. Terrizzano, Renée J. Miller
    ESIDA@IUI, pp. 25-32, 2017.

  • Dissociation and propagation for approximate lifted inference with standard relational database management systems
    Wolfgang Gatterbauer, Dan Suciu
    VLDBJ (Special Issue of VLDB Journal on “best of VLDB 2015“). pp. 5-30, 2016.
    pdf | arXiv:1310.6257 (long)

  • Data quality: The role of empiricism
    Shazia Wasim Sadiq, Tamraparni Dasu, Xin Luna Dong, Juliana Freire, Ihab F. Ilyas, Sebastian Link, Renée J. Miller, Felix Naumann, Xiaofang Zhou, Divesh Srivastava
    SIGMOD Record 46(4): 35-43 (2017)

  • The future of data integration (Keynote Abstract)
    Renée J. Miller
    KDD, p. 3, 2017.

  • A machine learning approach for result caching in web search engines
    Tayfun Kucukyilmaz, Berkant Barla Cambazoglu, Cevdet Aykanat, Ricardo Baeza-Yates
    Inf. Process. Manag. 53(4): 834-850 (2017)

  • Story-focused reading in online news and its potential for user engagement
    Janette Lehmann, Carlos Castillo, Mounia Lalmas, Ricardo Baeza-Yates
    J. Assoc. Inf. Sci. Technol. 68(4): 869-883 (2017)

  • Quality-efficiency trade-offs in machine learning for text processing
    Ricardo Baeza-Yates, Zeinab Liaghat:
    BigData 2017: 897-904

  • FA*IR: A Fair Top-k Ranking Algorithm
    Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, Ricardo Baeza-Yates:
    CIKM 2017: 1569-1578

  • Detection of Trending Topic Communities: Bridging Content Creators and Distributors
    Lorena Recalde, David F. Nettleton, Ricardo Baeza-Yates, Ludovico Boratto:
    HT 2017: 205-213

  • Exploring Query Auto-Completion and Click Logs for Contextual-Aware Web Search and Query Suggestion
    Liangda Li, Hongbo Deng, Anlei Dong, Yi Chang, Ricardo Baeza-Yates, Hongyuan Zha
    WWW 2017: 539-548


  • Merlin: Exploratory Analysis with Imprecise Queries
    Bahar Qarabaqi, Mirek Riedewald
    IEEE TKDE, 28(2): 342-355, 2016. (TKDE special issue on “best of ICDE 2014“)

  • Making sense of entities and quantities in web tables
    Yusra Ibrahim, Mirek Riedewald, Gerhard Weikum
    CIKM, pp. 1703-1712, 2016.
    pdf | preprint

  • Visual congruent ads for image search
    Yannis Kalantidis, Ayman Farahat, Lyndon Kennedy, Ricardo Baeza-Yates, David A. Shamma
    ICPR 2016: 1496-1505

  • Encouraging Diversity- and Representation-Awareness in Geographically Centralized Content
    Eduardo Graells-Garrido, Mounia Lalmas, Ricardo Baeza-Yates:
    IUI 2016: 7-18

  • Data Portraits and Intermediary Topics: Encouraging Exploration of Politically Diverse Profiles
    Eduardo Graells-Garrido, Mounia Lalmas, Ricardo Baeza-Yates:
    IUI 2016: 228-240

  • Scalable Semantic Matching of Queries to Ads in Sponsored Search Advertising
    Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, Fabrizio Silvestri, Ricardo Baeza-Yates, Andrew Feng, Erik Ordentlich, Lee Yang, Gavin Owens
    SIGIR 2016: 375-384

  • Towards Mobile Query Auto-Completion: An Efficient Mobile Application-Aware Approach
    Aston Zhang, Amit Goyal, Ricardo Baeza-Yates, Yi Chang, Jiawei Han, Carl A. Gunter, Hongbo Deng
    WWW 2016: 579-590