Jim the Realtor put up a post on Zillow over at his excellent blog, and I took the opportunity to ask about something that’s always stuck in my craw. To wit:
Here’s something that always bugged me about the zestimates. If you look at any SD property’s price history (or that of SD as a whole), it basically shows the price going pretty steadily up through 2003-4 and then skyrocketing in the first 6-9 months of 2005.
This isn’t how it happened at all, though… the Case Shiller indexes (and pretty much all other data sources too) indicate that the parabolic blowoff phase happened in early 2004, not early 2005. And yet, Zillow has always showed this obviously incorrect pricing history. Anyone have an idea as to what’s going on there? Since they seem to be talking up their accuracy, why have they made no attempt to fix this obvious and quite substantial inaccuracy?
Just for kicks I put together a chart to show what I was talking about:
Jim thinks that the distortion was caused by a bigger drop in low-end than high-end sales. It’s hard to believe that effect could account for such a glaring error, especially considering that the volume disparity between low and high priced homes became much wider later on. But I don’t have a better explanation for what’s caused the discrepancy — or why they haven’t fixed it. Any ideas?
The general observation
The general observation about the blowoff phase of the SD RE market seems to be correct, but it appears that both Case/Schiller and Zillow show the same date for the market peak and consequent downturn.
IMNSHO, the absolute value of an index isn’t as important as the trend. Both estimates show an upward trend going into 2006 and then a consequent drop. I never put much weight on Zillow’s zestimates for any individual property, but I think their charts are useful for determining the trend of a market in any area.
Asterix
I have a theory, too. The
I have a theory, too. The national market and many California sub-markets (but not San Diego) did indeed spike in 2005. So I’m thinking that Zillow’s algorithm somehow has a “spillover effect” from the larger category (USA or CA) to the smaller one (San Diego). The effect might not be very large, but a few percentage points added to 2005 and substracted from 2004 is all that it would take to do the trick.
What are the units for the
What are the units for the Y-axis in the Piggington graph?
If the unit of measurement of the Y-axis in both graphs are $, why are the values different?
First one is not dollars.
First one is not dollars. It’s an index arbitrarily set to 100 at some point in the past (Jan 2000, I think)
Median prices are in
Median prices are in agreement with Case Shiller. Median prices blew up mid-2003 and almost plateaued by mid-2004.
The lag makes perfect sense.
The lag makes perfect sense. Zillow bases their information on comparables. Comparables for sales are going to lag 3-6 months in the database.
It’s actually the same problem much of southern Californian is having on the down side. Sales have slowed and now, people price looking at comparables, but the most recent comparable is three months old and to get 3 or more comparables, they’re often looking back as far as the spring.
The lag and the fact that
The lag and the fact that Zestimates are grossly inaccurate makes sense to me; although I can’t tell you how many people in the discussion boards cry that their home is “undervalued.” Wait a few years…
Which gets back to the lag. Zillow is also WAY behind in posting drops in property values. So any drop they do show today is probably up to a year old.
A reader mailed in these
A reader mailed in these thoughts, which I thought were interesting and which I am posting with his permission — Rich
With respect to your post on "Zillow’s Lost Year" part of the issue is a data presentation issue on the two graphs you included. I’ve attached an excel spreadsheet with the ZIndex values for San Diego and the corresponding Case-Shiller HPI values. Similar to the CS HPI I’ve normalized the ZIndex values to Jan ’00. Zillow does not provide a means to download index data so these values are eyeballed from Zillow charts.
You can see that Zillow properly accounts for the price run up as shown by the CS HPI through ’05. However, the ’05-’06 price blowup in the ZIndex that you mention in your post is actually an overestimation of price increases with respect to the CS HPI (not a lag in the ZIndex as implied by your post).
Without knowing more details about how ZEstimates are calculated I would guess that a likely cause is that they are partially based on historic home price appreciation trends. Notice that in the data set the ZIndex values (and CS HPI) had increased by roughly 20% for the previous 3 years. If ZEstimates are partially based on recent price trends it would follow that they would tend to miss inflection points.
Note that this does not entirely account for the degree of the ZIndex increase in Jan ’06, which was a 35% increase over Jan ’05.
Also, if ZEstimates heavily weigh recorded sales values, which would make sense, a decrease in sales volume would cause a corresponding lag in the index correcting as it would still be referencing older data points. Considering SD County sales peaked in 8/05 (according to http://bubbletracking.blogspot.com/search/label/SD%20Inventory) and also considering that the low end sales dried up substantially while high end sales and prices held up reasonably well, this would cause further lag in the index correcting to actual home prices.
As for why Zillow hasn’t fixed the discrepancy, it seems better for everyone if they don’t fudge past index data, otherwise they could just retroactively replace their historic index with the CS HPI. If they leave their data as is their customers have a means of comparing past ZIndex values vs actual recorded prices at those points in time to verify their model’s accuracy.