Page is infrequently updated. Latest publications plus extended material are also available from the faculty and/or project web pages.
2021 and preprints
RONIN: Data Lake Exploration.
Paul Ouellette, Aidan Sciortino, Fatemeh Nargesian, Bahar Ghadiri Bashardoost, Erkang Zhu, Ken Q. Pu, Renée J. Miller.
PVLDB Demonstration (To Appear), 2021.
pdf (local)Towards Knowledge Exchange: State-of-the-Art and Open Problems.
Bahar Ghadiri Bashardoost, Kelly A. Lyons, Renée J. Miller: SOFSEM 2021: 13-27.
pdf (springer). pdf (local)Towards a Dichotomy for Minimally Factorizing the Provenance of Self-Join Free Conjunctive Queries.
Neha Makhija, Wolfgang Gatterbauer.
pdf (arXiv:2105.14307, long version)Beyond Equi-joins: Ranking, Enumeration and Factorization.
Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald.
PVLDB 2021 (to appear)
pdf (arXiv:2101.12158, long version)
Project page: Any-kTractable Orders for Direct Access to Ranked Answers of Conjunctive Queries.
Nofar Carmeli, Nikolaos Tziavelis, Wolfgang Gatterbauer, Benny Kimelfeld, Mirek Riedewald.
PODS, pp. 325-341, 2021 (selected among best of conference)
pdf (ACM) | pdf (local) | pdf (arXiv:2012.11965) | video (20min) | video (12min)
Project page: Any-kDomainNet: Homograph Detection for Data Lake Disambiguation.
Aristotelis Leventidis, Laura Di Rocco, Wolfgang Gatterbauer, Renée J. Miller, Mirek Riedewald.
EDBT, pp. 13-24, 2021 (best paper award, announcement)
pdf (DOI) | pdf (OP) | pdf (local) | pdf (arXiv:2103.09940) | video (10min) | Code on Github | datasets |
Project page: Table-as-querySTRATISFIMAL LAYOUT: A modular optimization model for laying out layered node-link network visualizations
Sara Di Bartolomeo, Mirek Riedewald, Wolfgang Gatterbauer, and Cody Dunne
IEEE VIS 2021 (to appear)
pdf (OSF preprint, long version) | OSF supplements
Project page: Stratisfimal Layout
Project page: QueryVis
2020
Knowledge Translation.
Bahar Ghadiri Bashardoost, Renée J. Miller, Kelly Lyons, Fatemeh Nargesian. PVLDB 13(11): 2018-2032 (2020).
pdfPytheas: Pattern-based Table Discovery in CSV Files.
Christina Christodoulakis, Eric Munsen, Moshe Gabel, Angela Demke Brown, Renée J. Miller. PVLDB 13(11): 2075-2089 (2020).
pdfOrganizing data lakes for navigation.
Fatemeh Nargesian, Ken Q. Pu, Erkang Zhu, Bahar Ghadiri Bashardoost, Renée J. Miller,
SIGMOD, pp. 1939–1950, 2020.
pdfOptimal algorithms for ranked enumeration of answers to full conjunctive queries
Nikolaos Tziavelis, Deepak Ajwani, Wolfgang Gatterbauer, Mirek Riedewald, Xiaofeng Yang
PVLDB 13(9):1582-1597, 2020.
pdf (VLDB) | pdf (local) | pdf (arXiv:1911.05582, long version) | video (10min) | PPTX (slides) | narrated PPTX (100MB) | pdf (slides) | Code on Github | reproducibility instructions
Project page: Any-kQueryVis: Logic-based diagrams help users understand complicated SQL queries faster
Aristotelis Leventidis, Jiahui Zhang, Cody Dunne, Wolfgang Gatterbauer, HV Jagadish, Mirek Riedwald
SIGMOD, pp. 2303–2318, 2020 (reproducibility award, announcement)
pdf (ACM) | pdf (local) | pdf (arXiv:2004.11375, long version) | pdf (OSF preprint, long version) | OSF supplemental materials | video (12min) | related video (19min)
Project page: QueryVisNear-optimal distributed band-joins through recursive partitioning
Rundong Li, Wolfgang Gatterbauer, Mirek Riedewald
SIGMOD, pp. 2375–2390, 2020.
pdf (ACM) | pdf (local) | pdf (arXiv:2004.06101) | video (12min)
Project page: Distributed computingFactorized graph representations for semi-supervised learning from sparse data
Krishna Kumar P., Paul Langon, Wolfgang Gatterbauer
SIGMOD, pp. 1383–1398, 2020 (reproducibility award, announcement)
pdf (ACM) | pdf (local) | pdf (arXiv:2003.02829, long version) | video (11min) | narrated PPTX (20MB) | pdf (slides) | Code on Github | reproducibility instructions
Project page: Semi-supervised learning with heterophily (SSLH)New results for the complexity of resilience for binary conjunctive queries with self-joins
Cibele Freire, Wolfgang Gatterbauer, Neil Immerman, Alexandra Meliou
PODS, pp. 271–284, 2020.
pdf (ACM) | pdf (local) | pdf (arXiv:1907.01129, long version) | PPTX (slides) | pdf (slides) | video (12min)
Project page: CausalityOptimal join algorithms meet top-k (tutorial)
Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald
SIGMOD tutorials, pp. 2659–2665, 2020.
pdf (ACM) | pdf (local) | pdf (arXiv:2005.00448) | video part 1 (18min) | video part 2 (34min) | video part 3 (34min) | PPTX (slides 1) | PPTX (slides 2) | PPTX (slides 3) | pdf (slides 1) | pdf (slides 2) | pdf (slides 3)
Tutorial page: Optimal join algorithms meet top-k
Project page: Any-kLoch Prospector: Metadata Visualization for Lakes of Open Data
Neha Makhija, Mansi Jain, Nikolaos Tziavelis, Laura Di Rocco, Sara Di Bartolomeo, Cody Dunne.
IEEE VIS (Short papers), pp. 126-130, 2020.
pdf(IEEE) | pdf (OSF preprint) | video | teaser video
Project page: Loch prospectorTemporal betweenness centrality in dynamic graphs
Ioanna Tsalouchidou, Ricardo Baeza-Yates, Francesco Bonchi, Kewen Liao, Timos Sellis
Int. J. Data Sci. Anal. 9(3): 257-272 (2020)
pdfScalable Dynamic Graph Summarization
Ioanna Tsalouchidou, Francesco Bonchi, Gianmarco De Francisci Morales, Ricardo Baeza-Yates
IEEE Trans. Knowl. Data Eng. 32(2): 360-373 (2020)
pdfPre-indexing Pruning Strategies
Soner Altin, Ricardo Baeza-Yates, Berkant Barla Cambazoglu
SPIRE 2020: 177-193Every Colour You Are: Stance Prediction and Turnaround in Controversial Issues
Eduardo Graells-Garrido, Ricardo Baeza-Yates, Mounia Lalmas
WebSci 2020: 174-183
arXiv:2005.100019
2019
JOSIE: Overlap set similarity search for finding joinable tables in data lakes
Erkang Zhu, Dong Deng, Fatemeh Nargesian, Renée J. Miller
SIGMOD, pp. 847-864, 2019.
pdfAnytime approximation in probabilistic databases via scaled dissociations
Maarten Van den Heuvel, Peter Ivanov, Wolfgang Gatterbauer, Floris Geerts, Martin Theobald
SIGMOD, pp. 1295-1312, 2019.
pdf | preprint | bibVISE: Vehicle Image Search Engine with traffic camera
Hyewon Choi, Erkang Zhu, Arsala Bangash, Renée J. Miller
PVLDB 12(12): 1842-1845 (2019).
pdfData lake management: Challenges and opportunities (tutorial)
Fatemeh Nargesian, Erkang Zhu, Renée J. Miller, Ken Q. Pu, Patricia C. Arocena
PVLDB 12(12): 1986-1989 (2019).
pdfBridging quantities in tables and text
Yusra Ibrahim, Mirek Riedewald, Gerhard Weikum, Demetrios Zeinalipour-Yatzi
ICDE, pp. 1010-1021, 2019.
pdfA collective, probabilistic approach to schema mapping using diverse noisy evidence
Angelika Kimmig, Alex Memory, Renée J. Miller, Lise Getoor
IEEE Trans. Knowl. Data Eng. 31(8): 1426-1439 (2019).
pdfAbstract cost models for distributed data-intensive computations
Rundong Li, Ningfang Mi, Mirek Riedewald, Yizhou Sun, Yi Yao
DAPD Journal, 37(3): 411-439, 2019.
pdf | bibAlgebraic approximations of the probability of Boolean functions
Wolfgang Gatterbauer
SUM (Invited Keynote), pp. 449-450, 2019.
2018
Submodularity of distributed join computation
Rundong Li, Mirek Riedewald, Xinyan Deng
SIGMOD, pp. 1237-1252, 2018.
pdf | bibPISTIS: A conflict of interest declaration and detection system for peer review management
Siyuan Wu, Leong Hou U, Sourav S. Bhowmick, Wolfgang Gatterbauer
SIGMOD, pp. 1713-1716, 2018 (System demonstration).
pdf | preprint | bibTable union search on open data
Fatemeh Nargesian, Erkang Zhu, Ken Q. Pu, Renée J. Miller
PVLDB 11(7): 813-825 (2018).
pdf | pdfOpen data integration
Renée J. Miller
PVLDB 11(12): 2130-2139 (2018). Invited keynote.
pdf | slidesAny-k: Anytime top-k tree pattern retrieval in labeled graphs
Xiaofeng Yang, Deepak Ajwani, Wolfgang Gatterbauer, Patrick K Nicholson, Mirek Riedewald, Alessandra Sala
WWW, pp. 489-498, 2018.
pdf | preprint | arXiv:1802.06060 | bibAny-k algorithms for exploratory analysis with conjunctive queries
Xiaofeng Yang, Mirek Riedewald, Rundong Li, Wolfgang Gatterbauer
ExploreDB, pp. 1-3, 2018.
pdf | preprintDissociation-based oblivious bounds for weighted model counting
Li Chou, Wolfgang Gatterbauer, Vibhav Gogate
UAI, 2018.
pdf | preprintA General framework for anytime approximation in probabilistic databases
Maarten Van den Heuvel, Floris Geerts, Wolfgang Gatterbauer, Martin Theobald
StarAI (IJCAI workshop), 2018 (Short position paper).
pdf | arXiv:1806.10078Algorithms for automatic ranking of participants and tasks in an anonymized contest
Yang Jiao, R. Ravi, Wolfgang Gatterbauer
Theoretical Computer Science, Elsevier, 2018 (Special Issue on “best of WALCOM 2017“).
pdf | preprintLearning from ouery-answers: A scalable approach to belief updating and parameter learning
Niccolò Meneghetti, Oliver Kennedy, Wolfgang Gatterbauer
ACM TODS, 2018 (Special Issue on “best of SIGMOD 2017“).
pdf | preprintMaking open data transparent: Data discovery on open data
Renée J. Miller, Fatemeh Nargesian, Erkang Zhu, Christina Christodoulakis, Ken Q. Pu, Periklis Andritsos
IEEE Data Eng. Bull. 41(2): 59-70 (2018).
pdfBias on the web
Ricardo Baeza-Yates
Commun. ACM 61(6): 54-61 (2018)
pdf | pdfWeb harvesting
Wolfgang Gatterbauer
Encyclopedia of Database Systems, Second Edition, Springer, 2018.
pdfWeb data extraction system
Robert Baumgartner, Wolfgang Gatterbauer, Georg Gottlob
Encyclopedia of Database Systems, Second Edition, Springer, 2018.
pdf
2017
Beta probabilistic databases: A scalable approach to belief updating and parameter learning
Niccolò Meneghetti, Oliver Kennedy, Wolfgang Gatterbauer
SIGMOD, pp. 573-586, 2017. (Invited to the Special Issue of TODS on “best of SIGMOD 2017“)
pdf | preprint | bibAutomated template generation for question answering over knowledge graphs
Abdalghani Abujabal, Mohamed Yahya, Mirek Riedewald, Gerhard Weikum
WWW, pages 1191-1200, 2017.
pdfInteractive navigation of open data linkages
Erkang Zhu, Ken Q. Pu, Fatemeh Nargesian, Renée J. Miller
PVLDB 10(12): 1837-1840 (2017).
pdfA collective, probabilistic approach to schema mapping
Angelika Kimmig, Alex Memory, Renée J. Miller, Lise Getoor
ICDE, pp. 921-932, 2017.
pdf | arXiv:1702.03447 | preprintThe linearization of belief propagation on pairwise Markov random fields
Wolfgang Gatterbauer
AAAI, pp. 3747-3753, 2017.
pdf | arXiv:1502.04956 (long) | bibDeepSea: Progressive workload-aware partitioning of materialized views in scalable data analytics
Jiang Du, Renée J. Miller, Boris Glavic, Wei Tan
EDBT, pp. 198-209, 2017.
pdfConflict of interest declaration and detection system in heterogeneous networks
Siyuan Wu, Leong Hou U, Sourav S Bhowmick, Wolfgang Gatterbauer
CIKM, pp. 2383-2386, 2017 (Short paper).
pdf | preprint | bibA case for abstract cost models for distributed execution of analytics operators
Rundong Li, Ningfang Mi, Mirek Riedewald, Yizhou Sun, Yi Yao
In Proc. Int. Conf. on Big Data Analytics and Knowledge Discovery (DaWaK), pages 149-163, 2017.
pdf | preprintAlgorithms for automatic ranking of participants and tasks in an anonymized contest
Yang Jiao, R. Ravi, Wolfgang Gatterbauer
WALCOM, pp 335-346, 2017. (Invited to the Special Issue of Elsevier TCS on “best of WALCOM 2017“)
pdf | arXiv:1612.04794 | bibVIQS: Visual interactive exploration of query semantics
Christina Christodoulakis, Eser Kandogan, Ignacio G. Terrizzano, Renée J. Miller
ESIDA@IUI, pp. 25-32, 2017.
pdfDissociation and propagation for approximate lifted inference with standard relational database management systems
Wolfgang Gatterbauer, Dan Suciu
VLDBJ (Special Issue of VLDB Journal on “best of VLDB 2015“). pp. 5-30, 2016.
pdf | arXiv:1310.6257 (long)Data quality: The role of empiricism
Shazia Wasim Sadiq, Tamraparni Dasu, Xin Luna Dong, Juliana Freire, Ihab F. Ilyas, Sebastian Link, Renée J. Miller, Felix Naumann, Xiaofang Zhou, Divesh Srivastava
SIGMOD Record 46(4): 35-43 (2017)
pdfThe future of data integration (Keynote Abstract)
Renée J. Miller
KDD, p. 3, 2017.
pdfA machine learning approach for result caching in web search engines
Tayfun Kucukyilmaz, Berkant Barla Cambazoglu, Cevdet Aykanat, Ricardo Baeza-Yates
Inf. Process. Manag. 53(4): 834-850 (2017)Story-focused reading in online news and its potential for user engagement
Janette Lehmann, Carlos Castillo, Mounia Lalmas, Ricardo Baeza-Yates
J. Assoc. Inf. Sci. Technol. 68(4): 869-883 (2017)Quality-efficiency trade-offs in machine learning for text processing
Ricardo Baeza-Yates, Zeinab Liaghat:
BigData 2017: 897-904FA*IR: A Fair Top-k Ranking Algorithm
Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, Ricardo Baeza-Yates:
CIKM 2017: 1569-1578Detection of Trending Topic Communities: Bridging Content Creators and Distributors
Lorena Recalde, David F. Nettleton, Ricardo Baeza-Yates, Ludovico Boratto:
HT 2017: 205-213Exploring Query Auto-Completion and Click Logs for Contextual-Aware Web Search and Query Suggestion
Liangda Li, Hongbo Deng, Anlei Dong, Yi Chang, Ricardo Baeza-Yates, Hongyuan Zha
WWW 2017: 539-548
2016
Merlin: Exploratory Analysis with Imprecise Queries
Bahar Qarabaqi, Mirek Riedewald
IEEE TKDE, 28(2): 342-355, 2016. (TKDE special issue on “best of ICDE 2014“)
pdfMaking sense of entities and quantities in web tables
Yusra Ibrahim, Mirek Riedewald, Gerhard Weikum
CIKM, pp. 1703-1712, 2016.
pdf | preprintVisual congruent ads for image search
Yannis Kalantidis, Ayman Farahat, Lyndon Kennedy, Ricardo Baeza-Yates, David A. Shamma
ICPR 2016: 1496-1505Encouraging Diversity- and Representation-Awareness in Geographically Centralized Content
Eduardo Graells-Garrido, Mounia Lalmas, Ricardo Baeza-Yates:
IUI 2016: 7-18Data Portraits and Intermediary Topics: Encouraging Exploration of Politically Diverse Profiles
Eduardo Graells-Garrido, Mounia Lalmas, Ricardo Baeza-Yates:
IUI 2016: 228-240Scalable Semantic Matching of Queries to Ads in Sponsored Search Advertising
Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, Fabrizio Silvestri, Ricardo Baeza-Yates, Andrew Feng, Erik Ordentlich, Lee Yang, Gavin Owens
SIGIR 2016: 375-384Towards Mobile Query Auto-Completion: An Efficient Mobile Application-Aware Approach
Aston Zhang, Amit Goyal, Ricardo Baeza-Yates, Yi Chang, Jiawei Han, Carl A. Gunter, Hongbo Deng
WWW 2016: 579-590
2015
Approximate lifted inference with probabilistic databases
Wolfgang Gatterbauer, Dan Suciu
PVLDB 8(5):629-640, 2015. (Invited to the Special Issue of VLDB Journal on “best of VLDB 2015“)
pdf | arXiv:1412.1069 | PPTX slides (4MB) | bibThe complexity of resilience and responsibility for self-join-free conjunctive queries
Cibele Freire, Wolfgang Gatterbauer, Neil Immerman, Alexandra Meliou
PVLDB 9(3):180-191, 2015.
pdf | arXiv:1507.00674 (long) | bibLinearized and single-pass belief propagation
Wolfgang Gatterbauer, Stephan Günnemann, Danai Koutra, Christos Faloutsos
PVLDB 8(5):581-592, 2015.
pdf | arXiv:1406.7288 (long) | PPTX slides (2MB) | Narrated PPTX slides (32MB) | Youtube video (21min) | Python code | SQL code | bib
2014
Anti-Combining for MapReduce
Alper Okcan and Mirek Riedewald
SIGMOD, pp. 839-850, 2014.
pdfUser-driven refinement of imprecise queries
Bahar Qarabaqi, Mirek Riedewald
ICDE, pp. 916-927, 2014. (Best Poster Award for poster presentation accompanying the full research paper, Invited to the TKDE special edition on “best of ICDE 2014” )
pdf | preprintILP modulo data
Panagiotis Manolios, Vasilis Papavasileiou, Mirek Riedewald
In Proc. Conf. on Formal Methods in Computer-Aided Design (FMCAD), pages 171-178, 2014.
pdf | preprintOblivious bounds on the probability of Boolean functions
Wolfgang Gatterbauer, Dan Suciu
ACM TODS, 39(1):191-208, 2014.
pdf | preprint | arXiv:1409.6052 | SQL code | Java code for GT | bib