I have pieces of what you want because a great deal of my data is normalized and I’ve added a lot to it. For example, I know the geolocation of virtually every property for sale, and that allows me to do cool things like check to see if a home has been relisted because it will have an identical lat/lon.
I have often considered blending my data with my tax roll and foreclosure data (which I have left in separate silos).
Suffice it to say, creating a highly normalized dataset is not an easy task. Humans don’t enter their data consistently, and so many errors occur (wrong zip, misspelled street names, wrong house numbers). You can’t just use something like the tax id, because while that works for homes with separate lots, it won’t work in condo situations. I’ve been working at this for some time, and it’s pretty complicated (not that anyone on this forum couldn’t create a solid set if they had the years and drive to do it).
You must also be careful about licensing issues. You’re going to need to start with someone else’s dataset that has cost them a lot of money to create in the first order. You can’t just expose that data to the world…someone who licenses will need to be its keeper and can assist in how it is utilized. If you don’t, you will surely trigger a justifiable law suit because these are million dollar data sets that they have created even if they are flawed and/or crappy.
You would think that county recorder’s offices would be more high-tech, but the Internet is really only about 10 years old, and many of them just haven’t been able to make their data sets public, and some, like San Diego County, have privacy concerns. Some counties in the US have managed to deal with these issues; in Orange County, FL, you can even go on line and type in someone’s name and pull up their signed deed!
One note on your post. I think everyone who has access to Sandicor MLS has equal access. It may take some doing to extract the data, but it can be done. They don’t expose their geolocation data, so I have paid thousands (tens?) to geolocate my data.
I think the closest thing to what you want is the County data set. That won’t track stuff like homes for sale, including days on market, but it has the core housing data. Dataquick also has stuff on home mortgages attached to each record. Sandicor is exclusively a listing info tool.
On the Sandicor data….The good news is that smart folks that know about some of these errors can leverage them when buying a home. The use of statistics cuts both ways, and we do it all the time for our clients. If you really know what you’re doing and can present rock solid data with an offer, it can get serious attention. No agent, esp. rookies of which there are many, don’t want to look like fools in front of their clients and an esteemed colleague. If you can show that the REAL market time is 89 days and the Months of Inventory is 27 months, you can probably get your low ball offer taken seriously.
Anyway, if anyone has an interesting question that they want me to try to crunch the numbers on my data set to get an answer, please let me know.