- This topic has 9 replies, 5 voices, and was last updated 18 years, 2 months ago by JohnHokkanen.
-
AuthorPosts
-
July 21, 2006 at 4:01 PM #6962July 22, 2006 at 8:44 AM #29271barnaby33Participant
Thank you John. Did you have to do this analysis by hand? If you have a spread sheet, or other electronic format, I could perhaps help you automate it.
Josh
July 22, 2006 at 9:13 AM #29273JohnHokkanenParticipantThanks for the offer. It is now an automated process — but I had to design the algorithms and write the program to do it. I should now be able to run the numbers for any particular zip code(s). I used San Marcos because it was a relatively small set (400) so that it was a manageable process to hand check some of the calculated results.
John Hokkanen
Hokkanen Real Estate Team
http://www.SurfTheTurf.comJuly 22, 2006 at 9:54 AM #29274rankandfileParticipantI propose a rebuilding of a new database that objectively tracks homes and is maintained by an unbiased party, or even the open-source community. As it stands right now, the MLS in my opinion is a far from perfect resource for objective, reliable data.
I propose creating a database that gives each home one and only one unique ID, whether it is relisted or not…and stores a standard set of measurable home attribute data that can be compared from one home to the next. I am not sure how the MLS is set up, but I believe it is a database for LISTINGS, not HOMES. So when a home is listed as ID #1 and doesn’t sell, it is pulled off and relisted as ID#5, with a new time clock. This is tantamount to fraud, IMHO, because it distorts the quality of the data which is used by homesellers to gain a more favorable price/position in the transaction.
Another issue with the MLS is that it is not uniform or universal from one realtor to the next. One person or group might have info on one set of homes in an area, while another might not. Since comparable home prices are a factor in determining price, why use a hand-picked sample set of data rather than on that is truly open to the market?
I am going to start a new thread devoted to replacing the MLS with a new database.
July 22, 2006 at 10:29 AM #29275JohnHokkanenParticipantI have pieces of what you want because a great deal of my data is normalized and I’ve added a lot to it. For example, I know the geolocation of virtually every property for sale, and that allows me to do cool things like check to see if a home has been relisted because it will have an identical lat/lon.
I have often considered blending my data with my tax roll and foreclosure data (which I have left in separate silos).
Suffice it to say, creating a highly normalized dataset is not an easy task. Humans don’t enter their data consistently, and so many errors occur (wrong zip, misspelled street names, wrong house numbers). You can’t just use something like the tax id, because while that works for homes with separate lots, it won’t work in condo situations. I’ve been working at this for some time, and it’s pretty complicated (not that anyone on this forum couldn’t create a solid set if they had the years and drive to do it).
You must also be careful about licensing issues. You’re going to need to start with someone else’s dataset that has cost them a lot of money to create in the first order. You can’t just expose that data to the world…someone who licenses will need to be its keeper and can assist in how it is utilized. If you don’t, you will surely trigger a justifiable law suit because these are million dollar data sets that they have created even if they are flawed and/or crappy.
You would think that county recorder’s offices would be more high-tech, but the Internet is really only about 10 years old, and many of them just haven’t been able to make their data sets public, and some, like San Diego County, have privacy concerns. Some counties in the US have managed to deal with these issues; in Orange County, FL, you can even go on line and type in someone’s name and pull up their signed deed!
One note on your post. I think everyone who has access to Sandicor MLS has equal access. It may take some doing to extract the data, but it can be done. They don’t expose their geolocation data, so I have paid thousands (tens?) to geolocate my data.
I think the closest thing to what you want is the County data set. That won’t track stuff like homes for sale, including days on market, but it has the core housing data. Dataquick also has stuff on home mortgages attached to each record. Sandicor is exclusively a listing info tool.
On the Sandicor data….The good news is that smart folks that know about some of these errors can leverage them when buying a home. The use of statistics cuts both ways, and we do it all the time for our clients. If you really know what you’re doing and can present rock solid data with an offer, it can get serious attention. No agent, esp. rookies of which there are many, don’t want to look like fools in front of their clients and an esteemed colleague. If you can show that the REAL market time is 89 days and the Months of Inventory is 27 months, you can probably get your low ball offer taken seriously.
Anyway, if anyone has an interesting question that they want me to try to crunch the numbers on my data set to get an answer, please let me know.
John Hokkanen
Hokkanen Real Estate TeamJuly 22, 2006 at 11:58 AM #29277powaysellerParticipantJohn, I am still interested in whether the perception of better schools in Poway creates a price premium or time=on-market premium (lower DOM). Could you run the Poway DOM, and compare it to cities that are perceived to have a bad school district, like San Marcos or Vista or Ramona? They are all inland cities, with approximately similar pricing and age and house sizes, right? Is there any effect of the perception that Poway schools are better?
July 22, 2006 at 12:34 PM #29279JohnHokkanenParticipantThe more positive unique attributes, the more that the market is insulated from commodity market situations. Consequently, my gut instinct is that, yes, Poway will be insulated to some extent from market conditions that affect other areas where schools have worse scores.
I think you raise an interesting question, which is what is the relationship of school scores to market time? What if we were to graph the data points using elementary school scores as the X axis and Days on Market or Months of Inventory on the Y axis.
As to what segment of the market, I think we might get the clearest picture of a relationship (if there is one), if we looked at the bottom 20% of the market for each area examined. This market is the most resilient to other market effects, and could give good comparison numbers.
Since it is easiest to sort the data by zip code, it would be best to take average school scores for all elementary schools in a zip.
Given how weird the market is right now, I don’t know what we might find. It could be that the least expensive sector in Vista is getting so much pressure due to risk aversion that it performs better than Poway’s least expensive sector. I have no gut instinct as to how the numbers would play out. At least we would be controlling for all the sellers that think they can sell their homes for top dollar. If the numbers came out by saying the best priced homes in a better school district sell, on average, faster than the best priced homes in a worse district, I think you’d have your answer. What do you think?
If you have an idea about what exact correlation might make most sense. Feel free to email me directly at [email protected].
JH
July 22, 2006 at 10:18 PM #29308CardiffBaseballParticipantSince it is a small market, I generally check all of the new listings in Zip Realty for previous listings by googling the address. The only way it works when you do this is to hit the site that Google has cached, instead of the direct link.
Your algorithm sounds like the way to go, as my way is too hard for a bigger area. It’s fine for Cardiff, but probably not so good for San Marcos.
July 22, 2006 at 11:36 PM #29320rankandfileParticipantJohn,
I think an effort like creating a new housing database begins at the grassroots level, like what we are doing now. I also have GIS experience and could help in getting something started. We’d need a set of standard attributes that will be consistently measured (and comparable) over time. This venture would have to be quite involved and have support and input from a multitude of people from varying disciplines. At any rate, if the open-source software community can come up with a working model creating and updating applications (via CVS), there’s no reason why it can’t be done for this application.
July 23, 2006 at 1:23 AM #29325JohnHokkanenParticipantYes, I agree that it is a good project, and I have no doubt that you could come up with a more comprehensive list of data fields than anyone has implemented. However, I think it would be desirable to have an attorney on board who can provide some recommendations re: the licensing issues because any data extracted from the Sandicor database or its derivatives (e.g., Realtor.com, ZipRealty) is probably a derivative work. I don’t think they will allow anyone to publish derivatives without appropriate licensing measures, and the MLSes are very protective of their data. Though it would probably be in the public good to have the MLS data as open source, it isn’t public and the MLSes are not likely to make it so. Ditto even for other folks to like forsalebyowner.com (my guess). Their data is their business.
I’m not saying that you couldn’t bootleg a portion of their data from some sources like public pages of Realtor.com, but that is not likely going to provide you with the information that you need. For example, off market date is not a published piece of data on the listings published on the Internet; its available, more or less, only to realtors using their system. But even if the data is bootlegged and stored in a secure database, that data had better not be made public or it will surely draw the attention of the MLS lawyers.
My model for real estate services is centered around providing objective data and analysis. I would have implemented many more features and ideas, but I cannot do so because I must follow the licensing constraints on their data. You can see the San Diego MLS rules at http://www.Sandicor.com. I don’t know if Riverside publishes their rules.
Though I’m a lawyer (former litigator for the US govt.), I don’t practice anymore and am not a member of the Ca. bar, and so I won’t be of use in obtaining a workable legal opinion on the matter. I would not be surprised if a lawyer who reviews the matter renders an unfavorable opinion on leveraging any of the MLS data. The Dataquick data (i.e., tax data) is good data, but it won’t give you info like off-market date, and my guess is that the Dataquick lawyers will similarly protect their data.
Anyway, these are the reasons why I can’t expose my various datasets to others in their raw form. I’m happy to run analyses and provide summaries of the analysis as secondary summaries is permitted. They’re not my rules; I just have to follow them.
John H.
-
AuthorPosts
- You must be logged in to reply to this topic.