For those that may still be looking into this... First American and Corelogic are going to be the best bet for any data building models. That is, if you want 'accuracy'. They are the pretty much the main source fueling the rest of the aggregators, when 'quality' counts.
Corelogic prolly has the edge on anything pertaining to 'valuations' and AVM's...but First American is right there chin-to-chin with them. They both have been collecting property data since time immemorial. There is also a history of the two joining as one company, swapping data (swapping spit) and something will happen where First American will spin off on its own again. This has happened a couple of times.
Corelogic's tool 'Realist' is baked into 80% of the MLSs nationwide, providing the tax/sales/people data fueling the comps/CMA many realtors use for a quick look.
Attom bought Realty Trac. It's Largely welfare data from Corelogic https://www.ftc.gov/news-event... . They just sued Corelogic again, within the last two years or so, because the welfare data they get was largely crap, if that tells you anything of what was being sent to them. There was a distinct difference in hat Corelogic sent Attom under the court-ordered welfare agreement, and what Corelogic supplied directly to consumers. Buying from Corelogic direct is the way to go.
I will say Attom has upped its game in the last few years concerning ML and AI.... so who knows, they might actually have cleaner data now, and overall it might be cheaper to buy from them as a reseller.
First American are quality people, and they are scrappy... they have a huuuge database of scanned doc images that they have run ML against for the juicy tidbits of data a county/city does not collect or load onto public-facing .gov sites.
Most of the data you are looking for, like accurate monthly parcel/assessor/people ect data can be had just by googling your county name with 'open data GIS' in the search string. A modern county GIS dept will make a variety of datasets available as .csv, and some manner of GIS file like a .shapefile or KML, geofile ect. The bonus of that? It's a staggering amount of data, countywide, that is simple to load into a PostgreSQL-PostGIS install. It's stupid simple to do, and most realtors are oblivious that they can look at data on that scale, let alone outside of their MLS system. Go to your local county site, look for the nifty 'search for a property in the county' map, and look for the download button. If they do not have online, simply put in a Open Records Data request to the county.
For anyone looking to do this in Florida, MN, AZ, VA, or AL, https://propertykey.com/compan... is prolly your best bet... they're likely THE BEST dataset in use by realtors for maps/parcel/assessor/sales/people data itjustkeepsgoingonandon what they have. I could arrange an introduction with the team behind the data, I know them well, and they are just terrific people. Total 'makers' when it comes to property/GIS data for 25 years.... and most likely far less expensive as a reseller than some of the options out there.
MLS data? Scrape it. Scrape ALL of it. NAR and it's outdated approach to MLS data is OV/er https://www.eff.org/deeplinks/... . Just make sure to use rotate your Ip's on the regular, or set up some sort of proxy...
The County parcel datasets are the easy thing to obtain... It's the fresh sales data (what sold today/yesterday, to 'who', from 'who') is what's critical for any ML models you are creating. That's where the API will come into play for fresh sales.
Hope this helps~