After digging into it for some time, I think it's safe to say that the choices are slim to none - there are some relatively cheap (sometimes free) data sources that are very poor quality, and the good quality data is prohibitively expensive (House Canary charges $1 per API call - WOW).
So far, there seems to be a few types of data aggregators that make their datasets available:
*) Folks who are trying to promote "whitelabel clones" (e.g. Zillow, which comes with a host of restrictions expressly prohibiting "enriching other datasets" etc)
*) Those who will process and sanitize MLS for you (which is a huge task in itself) but you have to have MLS creds, meaning you have to deal with the zoo of MLS providers all by yourself - or pay something like $1200/mo for nationwide (US + Canada) listing feed
*) Folks who charge per call (CoreLogic, House Canary etc.) and who have high quality sets but they don't want you to mine them so they set all sorts of call rates and excessive price per call to ward off "gold diggers"
It is a mistake to say that the data "is abundant" and "if it is available, it must be worthless". It is definitely NOT abundant (and I am not talking about "websites" like Zillow, I am talking about raw datasets) and it is definitely valuable - if you can afford it of course :(