We submitted a revised version of our benchmarking paper to VLDB 2024, with the title “Retrieve, Merge, Predict: Augmenting Tables with Data Lakes”.
The preprint can be found on Arxiv, while the code has its own website, and is available on Github.
We also release YADL, the semi-synthetic benchmarking data lake that we used to run our experiments. The repository is also available on Github.
UPDATE June 2024: the paper was rejected. It was extremely disappointing, and I will write a post to detail what happened. It goes without saying that I am not happy about this outcome.