Yes, I agree that it is a good project, and I have no doubt that you could come up with a more comprehensive list of data fields than anyone has implemented. However, I think it would be desirable to have an attorney on board who can provide some recommendations re: the licensing issues because any data extracted from the Sandicor database or its derivatives (e.g., Realtor.com, ZipRealty) is probably a derivative work. I don’t think they will allow anyone to publish derivatives without appropriate licensing measures, and the MLSes are very protective of their data. Though it would probably be in the public good to have the MLS data as open source, it isn’t public and the MLSes are not likely to make it so. Ditto even for other folks to like forsalebyowner.com (my guess). Their data is their business.
I’m not saying that you couldn’t bootleg a portion of their data from some sources like public pages of Realtor.com, but that is not likely going to provide you with the information that you need. For example, off market date is not a published piece of data on the listings published on the Internet; its available, more or less, only to realtors using their system. But even if the data is bootlegged and stored in a secure database, that data had better not be made public or it will surely draw the attention of the MLS lawyers.
My model for real estate services is centered around providing objective data and analysis. I would have implemented many more features and ideas, but I cannot do so because I must follow the licensing constraints on their data. You can see the San Diego MLS rules at http://www.Sandicor.com. I don’t know if Riverside publishes their rules.
Though I’m a lawyer (former litigator for the US govt.), I don’t practice anymore and am not a member of the Ca. bar, and so I won’t be of use in obtaining a workable legal opinion on the matter. I would not be surprised if a lawyer who reviews the matter renders an unfavorable opinion on leveraging any of the MLS data. The Dataquick data (i.e., tax data) is good data, but it won’t give you info like off-market date, and my guess is that the Dataquick lawyers will similarly protect their data.
Anyway, these are the reasons why I can’t expose my various datasets to others in their raw form. I’m happy to run analyses and provide summaries of the analysis as secondary summaries is permitted. They’re not my rules; I just have to follow them.