Research
Projects with this topic
-
QuestionMark: the probabilistic benchmark is the Python program to benchmark any probabilistic database management system.
This project is written by Nikki Zandbergen as part of her M.Sc. Computer Science thesis at the University of Twente. This project was supervised by Maurice van Keulen, Tom van Dijk and Jan Flokstra.
To run this benchmark, a dataset should be generated with QuestionMark: The Dataset Generator.
Updated -
QuestionMark: The Dataset Generator is a Python program to create a dataset for probabilistic product matching. This dataset is required to run the benchmark test with QuestionMark: The Probabilistic Benchmark.
This project is written by Nikki Zandbergen as part of her M.Sc. Computer Science thesis at the University of Twente. This project was supervised by Maurice van Keulen, Tom van Dijk and Jan Flokstra.
The dataset created by this program is an adaptation of the WDC Product Data Corpus for Large-Scale Product Matching dataset. The clustering provided by this original dataset is removed and a new probabilistic clustering is introduced.
Updated