Latest publications plus extended material are also available from the faculty and/or project web pages.
2025
Adaptive Quotient Filters
Richard Wen, Hunter McCoy, David Tench, Guido Tagliavini, Michael Bender, Alex Conway, Martin Farach-Colton, Rob Johnson, Prashant Pandey
SIGMOD 2025
pdfZombie Hashing: Reanimating Tombstones in a Graveyard
Yuvaraj Chesetti, Benwei Shi, Jeff M. Phillips, Prashant Pandey
SIGMOD 2025
pdfEvaluating Learned Indexes for External-Memory Joins
Yuvaraj Chesetti, Prashant Pandey
ACDA 2025
pdfA Unified and Practical Approach for Generalized Deletion Propagation
PVLDB 2025 (to appear)Project page: Unified Reverse Data ManagementAny-k Algorithms for Enumerating Ranked Answers to Conjunctive Queries
TODS (to appear)Project page: Any-kResilience for Regular Path Queries: Towards a Complexity Classification
PODS (to appear)Project page: Unified Reverse Data ManagementRelational Diagrams and the Pattern Expressiveness of Relational Languages
Project page: Relational Diagrams
ACM SIGMOD Research Highlight
2024
Gallatin: A General-Purpose GPU Memory Manager
Hunter McCoy, Prashant Pandey
PPoPP 2024
pdfIONIA: High-Performance Replication for Modern Disk-based KV Stores
Yi Xu, Henry Zhu, Prashant Pandey, Alex Conway, Rob Johnson, Ramnatthan Alagappan, Aishwarya Ganesan
FAST 2024
pdfBeyond Bloom: A Tutorial on Future Feature-Rich Filters
Prashant Pandey, Martin Farach-Colton, Niv Dayan, Huanchen Zhang
SIGMOD 2024
pdfBYO: A Unified Framework for Benchmarking Large-Scale Graph Containers
Brian Wheatman, Xiaojun Dong, Zheqi Shen, Laxman Dhulipala, Jakub Łącki, Prashant Pandey, Helen Xu
VLDB 2024
pdfTowards Unbiased Exploration in Partial Label Learning
Towards Agentic Schema Refinement
Ranked Enumeration for Database Queries
Project page: Any-kOn the Reasonable Effectiveness of Relational Diagrams: Explaining Relational Query Patterns and the Pattern Expressiveness of Relational Languages
ACM | arXiv:2401.04758 (long version) | slides | video (18min) | reproducibility report | OSF supplements | OSF user study preregistration | Anonymous preregistration (what we submitted before we collected data) | Executed user analysis code | gs | bib
SIGMOD 2024 honorable mention (announcement)A Comprehensive Tutorial on over 100 years of Diagrammatic Representations of Logical Statements and Relational Queries
Tutorial page: Diagrammatic representation tutorial, Project page: Relational DiagramsA Unified Approach for Resilience and Causal Responsibility with Integer Linear Programming (ILP) and LP Relaxations
ACM | arXiv:2212.08898 (long version) | slides from Dagstuhl (Jan 2024) | Dagstuhl report (Jan 2024) | slides @ SIGMOD | talk @ SIGMOD (17min) | slides (Nov 2023) | talk @ Simons Institute (Nov 2023, 28min) | reproducibility report | Code on Github | gs | bibProject page: Unified Reverse Data ManagementMinimally Factorizing the Provenance of Self-Join Free Conjunctive Queries
Project page: Unified Reverse Data ManagementHITSNDIFFS: From Truth Discovery to Ability Discovery by Recovering Matrices with the Consecutive Ones Property
IEEE | preprint | arXiv:2401.00013 (long version) | slides | video (15min) | Code on Github | gs | bib
2023
Why Not Yet: Fixing a Top-k Ranking that Is Not Fair to Individuals
Zixuan Chen, Panagiotis Manolios, Mirek Riedewald.
VLDB 2023 (to appear)
Preprint versionEfficient Computation of Quantiles over Joins
Project page: Any-kSANTOS: Relationship-based Semantic Table Union Search
Project page: Table-as-queryData Lake Organization.
Fatemeh Nargesian, Ken Q. Pu, Bahar Ghadiri Bashardoost, Erkang Zhu, Renée J. Miller.
IEEE Trans. Knowl. Data Eng. 35(1): 237-250 (2023).
pdf (arXiv:1812.07024)Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning.
Grace Fan, Jin Wang, Yuliang Li, Dan Zhang, Renée J. Miller.
PVLDB 2023.
pdf | GitHubExplaining Dataset Changes for Semantic Data Versioning with Explain-Da-V.
Roee Shraga, Renée J. Miller.
PVLDB 2023 (to appear).
pdf | GitHubFlexER: Flexible Entity Resolution for Multiple Intents.
Bar Genossar, Roee Shraga, Avigdor Gal.
SIGMOD 2023 (to appear).
pdf (arXiv:2209.07569) | GitHubTractable Orders for Direct Access to Ranked Answers of Conjunctive Queries
Project page: Any-kInvited version from best of PODS 2021Understanding Search Behavior Bias in Wikipedia.
Bruno Scarone, Ricardo Baeza-Yates, Erik Bernhardson.
Bias@ECIR 2023 (to appear).Human-Centered Responsible Artificial Intelligence: Current & Future Trends.
Mohammad Tahaei, Marios Constantinides, Daniele Quercia, Sean Kennedy, Michael J. Muller, Simone Stumpf, Q. Vera Liao, Ricardo Baeza-Yates, Lora Aroyo, Jess Holbrook, Ewa Luger, Michael Madaio, Ilana Golbin Blumenfeld, Maria De-Arteaga, Jessica Vitak, Alexandra Olteanu.
pdf (arXiv:2302.08157)Distance and Time Sensitive Filters for Similarity Search in Trajectory Datasets
Madhav Narayan Bhat, Paul Cesaretti, Mayank Goswami, Prashant Pandey
APOCS 2023
pdfCommunication Optimization for Distributed Execution of Graph Neural Networks
Süreyya Emre Kurt, Jinghua Yan, Aravind Sukumaran-Rajam, Prashant Pandey, P. Sadayappan
IPDPS 2023
pdfHigh-Performance Filters for GPUs
Hunter McCoy, Steven Hofmeyr, Katherine Yelick, Prashant Pandey
PPoPP 2023
pdfSingleton Sieving: Overcoming the Memory/Speed Trade-Off in Exascale k-mer Analysis
Hunter McCoy, Steven Hofmeyr, Katherine Yelick, Prashant Pandey
ACDA 2023
pdfIcebergHT: High Performance Hash Tables Through Stability and Low Associativity
Prashant Pandey, Michael Bender, Alex Conway, Martin Farach-Colton, William Kuszmaul, Guido Tagliavini, Rob Johnson
SIGMOD 2023
pdfBP-tree: Overcoming the Point-Range Operation Tradeoff for In-Memory B-trees
Helen Xu, Amanda Li, Brian Wheatman, Manoj Marneni, Prashant Pandey
VLDB 2023
pdfDomainNet: Homograph Detection and Understanding in Data Lake Disambiguation
Project page: Table-as-queryInvited version from EDBT 2021 best paper awardA Tutorial on Visual Representations of Relational Queries
Tutorial page: Visual query representations, Project page: Relational DiagramsUnsupervised learning of 3-colorings using simplicial higher-order neural networks
2022
Integrating Data Lake Tables
Project page: Table-as-queryKnowledge-Based News Event Analysis Toolkit.
Oktie Hassanzadeh, Parul Awasthy, Ken Barker, Onkar Bhardwaj, Debarun Bhattacharjya, Mark Feblowitz, Aamod Khatiwada, Lee Martie, Steve Fonin Mbouadeu, Jian Ni, Anik Saha, Sola Shirai, Kavitha Srinivas and Lucy Yip.
ISWC 2022.
pdf | HomepageKnowledge Graph Embeddings for Causal Relation Prediction.
Aamod Khatiwada, Sola Shirai, Kavitha Srinivas, Oktie Hassanzadeh.
DL4KG@ISWC 2022.
pdf | datasetsRule-Based Link Prediction over Event-Related Causal Knowledge in Wikidata.
Sola Shirai, Aamod Khatiwada, Debarun Bhattacharjya, Oktie Hassanzadeh.
Wikidata@ISWC 2022.
pdfHumanAL: calibrating human matching beyond a single task.
Roee Shraga.
HILDA@SIGMOD 2022.
pdf (ACM) | pdf (arXiv:2205.03209) | video (15 min)BiaScope: Visual Unfairness Diagnosis for Graph Embeddings.
Agapi Rissaki, Bruno Scarone, David Liu, Aditeya Pandey, Brennan Klein, Tina Eliassi-Rad, Michelle A. Borkin.
Symposium on Visualization in Data Science at IEEE VIS 2022.
pdf (arXiv:2210.06417) | GitHubPrinciples of Query Visualization
Project page: QueryVisInterpreting and understanding relational database queries using diagrams.
Wolfgang Gatterbauer.
Tutorial @ DIAGRAMS, 2022.
Project page: QueryVisToward Responsive DBMS: Optimal Join Algorithms, Enumeration, Factorization, Ranking, and Dynamic Programming
Tutorial page: Towards Responsive DBMS, Project page: Any-kAlgebraic Approximations of the Probability of Monotone Boolean Functions.
Wolfgang Gatterbauer
ISAIM 2022.
pdfMeike Zehlike, Tom Sühr, Ricardo Baeza-Yates, Francesco Bonchi, Carlos Castillo, Sara Hajian.
Fair Top-k Ranking with multiple protected groups.
Inf. Process. Manag. 59(1): 102707 (2022)Ricardo Baeza-Yates, Marina Estévez-Almenzar:
The Relevance of Non-Human Errors in Machine Learning.
EBeM@IJCAI 2022Eduardo Graells-Garrido, Ricardo Baeza-Yates.
Bots don’t Vote, but They Surely Bother!: A Study of Anomalous Accounts in a National Referendum. WebSci 2022: 302-306Ricardo Baeza-Yates, Usama M. Fayyad.
The Attention Economy and the Impact of Artificial Intelligence. Perspectives on Digital Humanism 2022: 123-134An Incrementally Updatable and Scalable System for Large-Scale Sequence Search using the Bentley-Saxe Transformation
Fatemeh Almodaresi, Jamshed Khan, Sergey Madaminov, Michael Ferdman, Rob Johnson, Prashant Pandey, Rob Patro
BIOINFORMATICS 2022
pdf
2021
RONIN: Data Lake Exploration.
Paul Ouellette, Aidan Sciortino, Fatemeh Nargesian, Bahar Ghadiri Bashardoost, Erkang Zhu, Ken Q. Pu, Renée J. Miller.
PVLDB Demonstration, 2021.
pdf (local)Towards Knowledge Exchange: State-of-the-Art and Open Problems.
Bahar Ghadiri Bashardoost, Kelly A. Lyons, Renée J. Miller.
SOFSEM 2021: 13-27.
pdf (springer). pdf (local)Beyond Equi-joins: Ranking, Enumeration and Factorization
Project page: Any-kTractable Orders for Direct Access to Ranked Answers of Conjunctive Queries.
Nofar Carmeli, Nikolaos Tziavelis, Wolfgang Gatterbauer, Benny Kimelfeld, Mirek Riedewald.
PODS, pp. 325-341, 2021
pdf (ACM) | pdf (local) | pdf (arXiv:2012.11965) | video (20min) | video (12min) | Project page: Any-kSelected among best of conference
DomainNet: Homograph Detection for Data Lake Disambiguation.
Aristotelis Leventidis, Laura Di Rocco, Wolfgang Gatterbauer, Renée J. Miller, Mirek Riedewald.
EDBT, pp. 13-24, 2021
pdf (DOI) | pdf (OP) | pdf (local) | pdf (arXiv:2103.09940) | video (10min) | Code on Github | datasets | Project page: Table-as-queryBest paper award (announcement)
Stratisfimal Layout: A modular optimization model for laying out layered node-link network visualizations
Project pages: Stratisfimal Layout, QueryVisDistributed-Memory k-mer Counting on GPUs
Israt Nisa, , Marquita Ellis, Leonid Oliker, Aydin Buluc, Katherine Yelick
IPDPS 2021
pdfVector Quotient Filters: Overcoming the Time/Space Trade-Off in Filter Design
Prashant Pandey, Alex Conway, Joe Durie, Michael Bender, Martin Farach-Colton, Rob Johnson
SIGMOD 2021
pdfTerrace: A Hierarchical Graph Container for Skewed Dynamic Graphs
, Brian Wheatman, Helen Xu, Aydin Buluc
SIGMOD 2021
pdfVariantStore: an index for large-scale genomic variant search
, Yinjie Gao, Carl Kingsford
Genome Biology 2021
pdf
2020
Knowledge Translation.
Bahar Ghadiri Bashardoost, Renée J. Miller, Kelly Lyons, Fatemeh Nargesian.
PVLDB 13(11): 2018-2032 (2020).
pdfPytheas: Pattern-based Table Discovery in CSV Files.
Christina Christodoulakis, Eric Munsen, Moshe Gabel, Angela Demke Brown, Renée J. Miller.
PVLDB 13(11): 2075-2089 (2020).
pdfOrganizing data lakes for navigation.
Fatemeh Nargesian, Ken Q. Pu, Erkang Zhu, Bahar Ghadiri Bashardoost, Renée J. Miller.
SIGMOD, pp. 1939–1950, 2020.
pdfOptimal algorithms for ranked enumeration of answers to full conjunctive queries.
Nikolaos Tziavelis, Deepak Ajwani, Wolfgang Gatterbauer, Mirek Riedewald, Xiaofeng Yang.
PVLDB 13(9):1582-1597, 2020.
pdf (VLDB) | pdf (local) | pdf (arXiv:1911.05582, long version) | video (10min) | PPTX (slides) | narrated PPTX (100MB) | pdf (slides) | Code on Github | reproducibility instructions | Project page: Any-kQueryVis: Logic-based diagrams help users understand complicated SQL queries faster.
Aristotelis Leventidis, Jiahui Zhang, Cody Dunne, Wolfgang Gatterbauer, HV Jagadish, Mirek Riedwald.
SIGMOD, pp. 2303–2318, 2020 (reproducibility award, announcement)
pdf (ACM) | pdf (local) | pdf (arXiv:2004.11375, long version) | pdf (OSF preprint, long version) | OSF supplemental materials | video (12min) | related video (19min) | Project page: QueryVisNear-optimal distributed band-joins through recursive partitioning.
Rundong Li, Wolfgang Gatterbauer, Mirek Riedewald.
SIGMOD, pp. 2375–2390, 2020.
pdf (ACM) | pdf (local) | pdf (arXiv:2004.06101) | video (12min) | Project page: Distributed computingFactorized graph representations for semi-supervised learning from sparse data.
Krishna Kumar P., Paul Langon, Wolfgang Gatterbauer.
SIGMOD, pp. 1383–1398, 2020 (reproducibility award, announcement).
pdf (ACM) | pdf (local) | pdf (arXiv:2003.02829, long version) | video (11min) | narrated PPTX (20MB) | pdf (slides) | Code on Github | reproducibility instructions | Project page: Semi-supervised learning with heterophily (SSLH)New results for the complexity of resilience for binary conjunctive queries with self-joins.
Cibele Freire, Wolfgang Gatterbauer, Neil Immerman, Alexandra Meliou.
PODS, pp. 271–284, 2020.
pdf (ACM) | pdf (local) | pdf (arXiv:1907.01129, long version) | PPTX (slides) | pdf (slides) | video (12min) | Project page: CausalityOptimal join algorithms meet top-k (tutorial).
Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald.
SIGMOD tutorials, pp. 2659–2665, 2020.
pdf (ACM) | pdf (local) | pdf (arXiv:2005.00448) | video part 1 (18min) | video part 2 (34min) | video part 3 (34min) | PPTX (slides 1) | PPTX (slides 2) | PPTX (slides 3) | pdf (slides 1) | pdf (slides 2) | pdf (slides 3) | Tutorial page: Optimal join algorithms meet top-k | Project page: Any-kLoch Prospector: Metadata Visualization for Lakes of Open Data.
Neha Makhija, Mansi Jain, Nikolaos Tziavelis, Laura Di Rocco, Sara Di Bartolomeo, Cody Dunne.
IEEE VIS (Short papers), pp. 126-130, 2020.
pdf(IEEE) | pdf (OSF preprint) | video | teaser video | Project page: Loch prospectorTemporal betweenness centrality in dynamic graphs.
Ioanna Tsalouchidou, Ricardo Baeza-Yates, Francesco Bonchi, Kewen Liao, Timos Sellis.
Int. J. Data Sci. Anal. 9(3): 257-272 (2020)
pdfScalable Dynamic Graph Summarization.
Ioanna Tsalouchidou, Francesco Bonchi, Gianmarco De Francisci Morales, Ricardo Baeza-Yates.
IEEE Trans. Knowl. Data Eng. 32(2): 360-373 (2020)
pdfPre-indexing Pruning Strategies.
Soner Altin, Ricardo Baeza-Yates, Berkant Barla Cambazoglu.
SPIRE 2020: 177-193Every Colour You Are: Stance Prediction and Turnaround in Controversial Issues.
Eduardo Graells-Garrido, Ricardo Baeza-Yates, Mounia Lalmas.
WebSci 2020: 174-183
arXiv:2005.100019Timely Reporting of Heavy Hitters Using External Memory
Prashant Pandey, Shikha Singh, Michael A. Bender, Jonathan W. Berry, Martin Farach-Colton, Rob Johnson, Thomas M. Kroeger, Cynthia A. Phillips
SIGMOD 2020, TODS 2021
pdf
2019
JOSIE: Overlap set similarity search for finding joinable tables in data lakes
Erkang Zhu, Dong Deng, Fatemeh Nargesian, Renée J. Miller
SIGMOD, pp. 847-864, 2019.
pdfAnytime approximation in probabilistic databases via scaled dissociations
Maarten Van den Heuvel, Peter Ivanov, Wolfgang Gatterbauer, Floris Geerts, Martin Theobald
SIGMOD, pp. 1295-1312, 2019.
pdf | preprint | bibVISE: Vehicle Image Search Engine with traffic camera
Hyewon Choi, Erkang Zhu, Arsala Bangash, Renée J. Miller
PVLDB 12(12): 1842-1845 (2019).
pdfData lake management: Challenges and opportunities (tutorial)
Fatemeh Nargesian, Erkang Zhu, Renée J. Miller, Ken Q. Pu, Patricia C. Arocena
PVLDB 12(12): 1986-1989 (2019).
pdfBridging quantities in tables and text
Yusra Ibrahim, Mirek Riedewald, Gerhard Weikum, Demetrios Zeinalipour-Yatzi
ICDE, pp. 1010-1021, 2019.
pdfA collective, probabilistic approach to schema mapping using diverse noisy evidence
Angelika Kimmig, Alex Memory, Renée J. Miller, Lise Getoor
IEEE Trans. Knowl. Data Eng. 31(8): 1426-1439 (2019).
pdfAbstract cost models for distributed data-intensive computations
Rundong Li, Ningfang Mi, Mirek Riedewald, Yizhou Sun, Yi Yao
DAPD Journal, 37(3): 411-439, 2019.
pdf | bibAlgebraic approximations of the probability of Boolean functions
Wolfgang Gatterbauer
SUM (Invited Keynote), pp. 449-450, 2019.An Efficient, Scalable and Exact Representation of High-Dimensional Color Information Enabled via de Bruijn Graph Search Problem
Fatemeh Almodaresi, , Michael Ferdman, Rob Johnson, Rob Patro
RECOMB 2019, JCB 2020
pdfLocality Sensitive Hashing for the Edit Distance
Guillaume Marcais, Dan DeBlasio, Prashant Pandey, Carl Kingsford
ISMB 2019, BIOINFORMATICS 2019
pdfSmall Refinements to the DAM Can Have Big Consequences for Data-Structure Design
Michael Bender, Alex Conway, Martin Farach-Colton, William Jannen, Yizheng Jiao, Rob Johnson, Eric Knorr, Sara McAllister, Nirjhar Mukherjee, Prashant Pandey, Donald E. Porter, Jun Yuan, Yang Zhan
SPAA 2019
pdf
2018
Submodularity of distributed join computation
Rundong Li, Mirek Riedewald, Xinyan Deng
SIGMOD, pp. 1237-1252, 2018.
pdf | bibPISTIS: A conflict of interest declaration and detection system for peer review management
Siyuan Wu, Leong Hou U, Sourav S. Bhowmick, Wolfgang Gatterbauer
SIGMOD, pp. 1713-1716, 2018 (System demonstration).
pdf | preprint | bibTable union search on open data
Fatemeh Nargesian, Erkang Zhu, Ken Q. Pu, Renée J. Miller
PVLDB 11(7): 813-825 (2018).
pdf | pdfOpen data integration
Renée J. Miller
PVLDB 11(12): 2130-2139 (2018). Invited keynote.
pdf | slidesAny-k: Anytime top-k tree pattern retrieval in labeled graphs
Xiaofeng Yang, Deepak Ajwani, Wolfgang Gatterbauer, Patrick K Nicholson, Mirek Riedewald, Alessandra Sala
WWW, pp. 489-498, 2018.
pdf | preprint | arXiv:1802.06060 | bibAny-k algorithms for exploratory analysis with conjunctive queries
Xiaofeng Yang, Mirek Riedewald, Rundong Li, Wolfgang Gatterbauer
ExploreDB, pp. 1-3, 2018.
pdf | preprintDissociation-based oblivious bounds for weighted model counting
Li Chou, Wolfgang Gatterbauer, Vibhav Gogate
UAI, 2018.
pdf | preprintA General framework for anytime approximation in probabilistic databases
Maarten Van den Heuvel, Floris Geerts, Wolfgang Gatterbauer, Martin Theobald
StarAI (IJCAI workshop), 2018 (Short position paper).
pdf | arXiv:1806.10078Algorithms for automatic ranking of participants and tasks in an anonymized contest
Yang Jiao, R. Ravi, Wolfgang Gatterbauer
Theoretical Computer Science, Elsevier, 2018 (Special Issue on “best of WALCOM 2017“).
pdf | preprintLearning from ouery-answers: A scalable approach to belief updating and parameter learning
Niccolò Meneghetti, Oliver Kennedy, Wolfgang Gatterbauer
ACM TODS, 2018 (Special Issue on “best of SIGMOD 2017“).
pdf | preprintMaking open data transparent: Data discovery on open data
Renée J. Miller, Fatemeh Nargesian, Erkang Zhu, Christina Christodoulakis, Ken Q. Pu, Periklis Andritsos
IEEE Data Eng. Bull. 41(2): 59-70 (2018).
pdfBias on the web
Ricardo Baeza-Yates
Commun. ACM 61(6): 54-61 (2018)
pdf | pdfWeb harvesting
Wolfgang Gatterbauer
Encyclopedia of Database Systems, Second Edition, Springer, 2018.
pdfWeb data extraction system
Robert Baumgartner, Wolfgang Gatterbauer, Georg Gottlob
Encyclopedia of Database Systems, Second Edition, Springer, 2018.
pdfSqueakr: An Exact and Approximate K-Mer Counting System
Prashant Pandey, Michael A. Bender, Rob Johnson, and Rob Patro
BIOINFORMATICS 2018
pdfMantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index
Prashant Pandey, Fatemeh Almodaresi, Michael A. Bender, Michael Ferdman, Rob Johnson, and Rob Patro
RECOMB 2018, Cell Systems 2018
pdfBuffered Count-Min Sketch on SSD: Theory and Experiments
Mayank Goswami, Dzejla Medjedovic, Emina Mekic, Prashant Pandey
ESA 2018
pdf
2017
Beta probabilistic databases: A scalable approach to belief updating and parameter learning
Niccolò Meneghetti, Oliver Kennedy, Wolfgang Gatterbauer
SIGMOD, pp. 573-586, 2017. (Invited to the Special Issue of TODS on “best of SIGMOD 2017“)
pdf | preprint | bibAutomated template generation for question answering over knowledge graphs
Abdalghani Abujabal, Mohamed Yahya, Mirek Riedewald, Gerhard Weikum
WWW, pages 1191-1200, 2017.
pdfInteractive navigation of open data linkages
Erkang Zhu, Ken Q. Pu, Fatemeh Nargesian, Renée J. Miller
PVLDB 10(12): 1837-1840 (2017).
pdfA collective, probabilistic approach to schema mapping
Angelika Kimmig, Alex Memory, Renée J. Miller, Lise Getoor
ICDE, pp. 921-932, 2017.
pdf | arXiv:1702.03447 | preprintThe linearization of belief propagation on pairwise Markov random fields
Wolfgang Gatterbauer
AAAI, pp. 3747-3753, 2017.
pdf | arXiv:1502.04956 (long) | bibDeepSea: Progressive workload-aware partitioning of materialized views in scalable data analytics
Jiang Du, Renée J. Miller, Boris Glavic, Wei Tan
EDBT, pp. 198-209, 2017.
pdfConflict of interest declaration and detection system in heterogeneous networks
Siyuan Wu, Leong Hou U, Sourav S Bhowmick, Wolfgang Gatterbauer
CIKM, pp. 2383-2386, 2017 (Short paper).
pdf | preprint | bibA case for abstract cost models for distributed execution of analytics operators
Rundong Li, Ningfang Mi, Mirek Riedewald, Yizhou Sun, Yi Yao
In Proc. Int. Conf. on Big Data Analytics and Knowledge Discovery (DaWaK), pages 149-163, 2017.
pdf | preprintAlgorithms for automatic ranking of participants and tasks in an anonymized contest
Yang Jiao, R. Ravi, Wolfgang Gatterbauer
WALCOM, pp 335-346, 2017. (Invited to the Special Issue of Elsevier TCS on “best of WALCOM 2017“)
pdf | arXiv:1612.04794 | bibVIQS: Visual interactive exploration of query semantics
Christina Christodoulakis, Eser Kandogan, Ignacio G. Terrizzano, Renée J. Miller
ESIDA@IUI, pp. 25-32, 2017.
pdfDissociation and propagation for approximate lifted inference with standard relational database management systems
Wolfgang Gatterbauer, Dan Suciu
VLDBJ (Special Issue of VLDB Journal on “best of VLDB 2015“). pp. 5-30, 2016.
pdf | arXiv:1310.6257 (long)Data quality: The role of empiricism
Shazia Wasim Sadiq, Tamraparni Dasu, Xin Luna Dong, Juliana Freire, Ihab F. Ilyas, Sebastian Link, Renée J. Miller, Felix Naumann, Xiaofang Zhou, Divesh Srivastava
SIGMOD Record 46(4): 35-43 (2017)
pdfThe future of data integration (Keynote Abstract)
Renée J. Miller
KDD, p. 3, 2017.
pdfA machine learning approach for result caching in web search engines
Tayfun Kucukyilmaz, Berkant Barla Cambazoglu, Cevdet Aykanat, Ricardo Baeza-Yates
Inf. Process. Manag. 53(4): 834-850 (2017)Story-focused reading in online news and its potential for user engagement
Janette Lehmann, Carlos Castillo, Mounia Lalmas, Ricardo Baeza-Yates
J. Assoc. Inf. Sci. Technol. 68(4): 869-883 (2017)Quality-efficiency trade-offs in machine learning for text processing
Ricardo Baeza-Yates, Zeinab Liaghat:
BigData 2017: 897-904FA*IR: A Fair Top-k Ranking Algorithm
Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, Ricardo Baeza-Yates:
CIKM 2017: 1569-1578Detection of Trending Topic Communities: Bridging Content Creators and Distributors
Lorena Recalde, David F. Nettleton, Ricardo Baeza-Yates, Ludovico Boratto:
HT 2017: 205-213Exploring Query Auto-Completion and Click Logs for Contextual-Aware Web Search and Query Suggestion
Liangda Li, Hongbo Deng, Anlei Dong, Yi Chang, Ricardo Baeza-Yates, Hongyuan Zha
WWW 2017: 539-548A General-Purpose Counting Filter: Making Every Bit Count
Prashant Pandey, Michael A. Bender, Rob Johnson, and Rob Patro
SIGMOD 2017
pdfdeBGR: An Efficient and Near-Exact Representation of the Weighted de Bruijn Graph
Prashant Pandey, Michael A. Bender, Rob Johnson, and Rob Patro
ISMB/BIOINFORMATICS 2017
pdfRainbowfish: A Succinct Colored de Bruijn Graph Representation
Fatemeh Almodaresi, Prashant Pandey, and Rob Patro
WABI 2017
pdf
2016
Merlin: Exploratory Analysis with Imprecise Queries
Bahar Qarabaqi, Mirek Riedewald
IEEE TKDE, 28(2): 342-355, 2016. (TKDE special issue on “best of ICDE 2014“)
pdfMaking sense of entities and quantities in web tables
Yusra Ibrahim, Mirek Riedewald, Gerhard Weikum
CIKM, pp. 1703-1712, 2016.
pdf | preprintVisual congruent ads for image search
Yannis Kalantidis, Ayman Farahat, Lyndon Kennedy, Ricardo Baeza-Yates, David A. Shamma
ICPR 2016: 1496-1505Encouraging Diversity- and Representation-Awareness in Geographically Centralized Content
Eduardo Graells-Garrido, Mounia Lalmas, Ricardo Baeza-Yates:
IUI 2016: 7-18Data Portraits and Intermediary Topics: Encouraging Exploration of Politically Diverse Profiles
Eduardo Graells-Garrido, Mounia Lalmas, Ricardo Baeza-Yates:
IUI 2016: 228-240Scalable Semantic Matching of Queries to Ads in Sponsored Search Advertising
Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, Fabrizio Silvestri, Ricardo Baeza-Yates, Andrew Feng, Erik Ordentlich, Lee Yang, Gavin Owens
SIGIR 2016: 375-384Towards Mobile Query Auto-Completion: An Efficient Mobile Application-Aware Approach
Aston Zhang, Amit Goyal, Ricardo Baeza-Yates, Yi Chang, Jiawei Han, Carl A. Gunter, Hongbo Deng
WWW 2016: 579-590Optimizing Every Operation in a Write-Optimized File System
Jun Yuan, Yang Zhan, William Jannen, Prashant Pandey, Amogh Akshintala, Kanchan Chandnani, Pooja Deo, Zardosht Kasheff, Michael Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter
FAST 2016
pdf
2015
Approximate lifted inference with probabilistic databases
Wolfgang Gatterbauer, Dan Suciu
PVLDB 8(5):629-640, 2015. (Invited to the Special Issue of VLDB Journal on “best of VLDB 2015“)
pdf | arXiv:1412.1069 | PPTX slides (4MB) | bibThe complexity of resilience and responsibility for self-join-free conjunctive queries
Cibele Freire, Wolfgang Gatterbauer, Neil Immerman, Alexandra Meliou
PVLDB 9(3):180-191, 2015.
pdf | arXiv:1507.00674 (long) | bibLinearized and single-pass belief propagation
Wolfgang Gatterbauer, Stephan Günnemann, Danai Koutra, Christos Faloutsos
PVLDB 8(5):581-592, 2015.
pdf | arXiv:1406.7288 (long) | PPTX slides (2MB) | Narrated PPTX slides (32MB) | Youtube video (21min) | Python code | SQL code | bibBetrFS: A Right-Optimized Write-Optimized File System
William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet, Yizheng Jiao, Ankur Mittal, , Phaneendra Reddy, Leif Walsh, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter
FAST 2015
pdf
2014
Anti-Combining for MapReduce
Alper Okcan and Mirek Riedewald
SIGMOD, pp. 839-850, 2014.
pdfUser-driven refinement of imprecise queries
Bahar Qarabaqi, Mirek Riedewald
ICDE, pp. 916-927, 2014. (Best Poster Award for poster presentation accompanying the full research paper, Invited to the TKDE special edition on “best of ICDE 2014” )
pdf | preprintILP modulo data
Panagiotis Manolios, Vasilis Papavasileiou, Mirek Riedewald
In Proc. Conf. on Formal Methods in Computer-Aided Design (FMCAD), pages 171-178, 2014.
pdf | preprintOblivious bounds on the probability of Boolean functions
Wolfgang Gatterbauer, Dan Suciu
ACM TODS, 39(1):191-208, 2014.
pdf | preprint | arXiv:1409.6052 | SQL code | Java code for GT | bib