Case shiller on zip code basis

User Forum Topic
Submitted by Eugene on December 25, 2007 - 2:46pm

Yesterday I was bored enough to download home resale data from http://users.ixpres.com/~gtriphan/ and try to run Case-Shiller algorithm on it.

Here's what I get for October

San Diego average: 17% off the peak

North coast (west of 5 and north of 8): 8% off the peak

Northeast SD (RB, RP, Poway, Scripps Ranch): 13% off the peak

Clairemont, Linda Vista, Serra Mesa : 17% off the peak

Southeast (Lemon Grove, Eastlake, Otay Ranch): 23% off the peak

Unless the decline stops or slows down, harder-hit areas will be down to 2002 prices before the end of 2008.

Submitted by zk on December 25, 2007 - 3:00pm.

Very interesting. What'd you do, exactly? Did you take the sales on identical addresses and compare them over time? Or something else?

Submitted by temeculaguy on December 25, 2007 - 3:09pm.

E, that's hot!! Get bored more often. So few media outlets break down the numbers enough to get a precise view, thank you.

It will be interesting to see if the trend can hold, if it does the value of the North will be more out of whack with the South and should feel the heat.

Submitted by Eugene on December 25, 2007 - 8:55pm.

. What'd you do, exactly? Did you take the sales on identical addresses and compare them over time

Yeah that's basically what Case-Shiller is about. You take consequent sales of identical addresses, throw away suspicious pairs, and then use a mathematical procedure to create a function that gives best approximation of the data.

Submitted by Daniel on December 25, 2007 - 10:58pm.

Wow! I mean, wow! Hats off to you, my friend!

First, congratulations for finding the site with all the sales data in nice text format. I didn't even know such a thing existed outside the MLS (well, Zillow has that, too, but can't be accessed by a script because of those smarty pants codes buried in images).

And, second, for putting together a program to analyze the data. I'm assuming that it's not THE Case-Shiller algorithm, as that must be proprietary stuff, but you should be pretty close if you follow their general prescription.

Submitted by SD Realtor on December 25, 2007 - 11:30pm.

esmith your find is a gem. I forgot about this sit and I would classify it as the best source for solds hands down. Great find and to me, data such as this is much more valuable then any zestimate or other third party presentation.

One thing that I am curious about is whether this site includes trustee sales and then sales at private auctions after the lender has taken title of the home.

I sent them email to check.

Very good work es-

SD Realtor

Submitted by XBoxBoy on December 26, 2007 - 11:34am.

So, since Case Shiller is coming up this week, I'm wondering does anyone know if Case-Shiller takes into account home improvements?

For instance if I buy a crummy little house in bird rock, for a million, tear it down and build a much bigger house and sell it for two million, will Case Shiller report a 100% price increase on this house? Or will Case-Shiller exclude this house because it is no longer the same thing?

If Case Shiller does not adjust for improvements, then it is going to overestimate the price increase, and under estimate the price decreases. If they do adjust for improvements, how do they do that as there would be many different cases and scenarios that would need to be adjusted for.

XBoxBoy

Submitted by Eugene on December 26, 2007 - 12:37pm.

I'm assuming that it's not THE Case-Shiller algorithm, as that must be proprietary stuff,

Here's a good description of their algorithm. It's missing some implementation details but it's almost complete.

http://www2.standardandpoors.com/spf/pdf...

if I buy a crummy little house in bird rock, for a million, tear it down and build a much bigger house and sell it for two million, will Case Shiller report a 100% price increase on this house?

They exclude houses with substantial physical changes if they can detect them in deed records. They also try to detect "unusual" price changes and either exclude them or assign lower weights to them.

Submitted by Eugene on December 27, 2007 - 3:32pm.

Chleaned up my data and made some graphs.

First, a consistency check (making sure my numbers are roughly the same as official Case-Shiller).

Consistency Check

Not a perfect match but good enough. There are too many implementation subtleties and my data is lower quality, so I can't hope for a perfect match.

Notice that I have two more points. Official Case-Shiller numbers for October are based on sales in August through October. I also have data for November.

Next, regional graphs.

Regional HPI - 07/11

Coastal areas are almost at the peak levels. November blip is probably seasonal. From Coronado to CV to Carlsbad, bubble denial is still strong.

Affluent north county inland neighborhoods are still holding. Middle-class and Mexican zip codes are in free fall.

Finally a quick look at specific resales, to see what kinds of transactions contibute to apparent 27% decline south of 94.

http://www.sdlookup.com/Property-37101EF...
Bought in June '05 for 595k (2% off the peak), sold for 460k

http://www.sdlookup.com/Property-439233E...
Bought in 11/2003 for 442k (31% off the peak), sold for 440k

http://www.sdlookup.com/Property-27FBE78...
Bought in Sep '05 for 650k (1% off the peak), sold for 422k

http://www.sdlookup.com/Property-A586DFE...
Bought in Dec '03 for 351k (28% off the peak), sold for 307k

Submitted by paranoid on December 27, 2007 - 7:07pm.

esmith,

this is great stuff. For local people watching the local market, this info is really useful.

is there a way to see a bigger graph?

Aslo please update the graph regularly so that we know what's going on in real-time.

thanks again.

Submitted by Asher on December 27, 2007 - 7:08pm.

Interesting stuff. I wonder why the southern areas are being hit so hard. I've been playing around with the Case Shiller stats as well and have a couple questions; if you don't mind I'd like to exchange emails. You can reach me at ashersd@gmail.com - feel free to email me so you don't have to post yours publicly.

Submitted by zk on December 27, 2007 - 11:27pm.

esmith,

That is really great stuff. Did you happen to do one for 92130 by itself?

Submitted by Eugene on December 29, 2007 - 6:38am.

All right, more numbers and more food for thought.

all neighborhoods

These are the averages for the 2nd half of 2007.

On one hand, a lot of denial and highly overpriced markets all along the coast. Poorer areas are getting hammered.

On the other hand, two areas with the least amount of appreciation since 2000 are Carmel Valley and Scripps Ranch. Chula Vista and National City are still extremely overpriced despite 20% declines.

is there a way to see a bigger graph?

Click on the graph, there should be a link called "original".

Submitted by stansd on December 29, 2007 - 8:22am.

Thank you for this...fantastic data-gives the best characterization of the local market I have seen.

Stan

Submitted by Rich Toscano on December 29, 2007 - 9:58am.

Wow, great work esmith. This must have been very time consuming -- did you do it all in Excel? I'd love to see how you did it.

Anyway, this verifies the anecdotal data that there are big disparities even within the CS price tiers. I will be extremely curious to see what happens to stalwart "Coast" category in the months ahead.

One thought -- I am not so sure that the recent downturn in the Coast properties is all seasonal. Looking back at prior years, there is clearly a seasonal tendency to decline at this time, but there are also some other recent factors that could explain it too (credit crunch round 2, mainly). So given the steepness of the recent decline, my bet is that it's partly seasonal and partly genuine.

I guess we will see in the months ahead, if you are able to keep this updated. Thanks for sharing this great info.

Rich

Submitted by Eugene on December 29, 2007 - 4:18pm.

This must have been very time consuming -- did you do it all in Excel? I'd love to see how you did it.

No, not in Excel, I'm not *THAT* bored :) Most of the work is done with C++, Excel is used to make final charts and tables.

this verifies the anecdotal data that there are big disparities even within the CS price tiers

Look at my last table (just updated).

Submitted by sdrealtor on December 29, 2007 - 4:43pm.

The numbers dont seem to pass the sniff test in my area. They seem to overstate the appreciation a bit. Thinking about my own home (which is worth roughly 15 to 20% above the median), it's current value is about 70% above what it was worth in the 2nd half of 2000 using very realistic values for both times. It's also down about 13% from its peak value.

Submitted by Eugene on December 29, 2007 - 9:39pm.

The numbers dont seem to pass the sniff test in my area.

What area is that?

BTW the official Case-Shiller change 10/2000 to 10/2007 is +106% low tier, +91% middle tier, +79% high tier.

Submitted by sdrealtor on December 29, 2007 - 10:07pm.

My area is North County Coastal defined as Encinitas and South Carlsbad which effectively functions as one market. The Zip codes are 92024, 92009 and 92011.

My personal home is very representative of the market as I know it. Homes like mine were around 500K in late 2000. They hit a peak of around 950K. Realistically, it is probably somewhere between 825 and 850K today which puts it 70% above its 2000 price and down around 13% from the peak.

I believe most of the homes in this area would show similar stats. I dont trust any stats I have seen because I havent seen any which accurately porttray what has and is happening. The only way to really understand that is to look at a property or area through the eyes of a good realtor who truly understands values in that area. In my area I think I know them as well as anyone and can very accurately estimate what a specific house would have sold for in the past and what it would sell for now.

Submitted by drunkle on December 29, 2007 - 10:32pm.

can you add graphs that chart the rate of change of the particular areas?

also, (maybe you did this already? or case-shiller did?) apply filters that weed out 95, 90, and 75 percentile data?

Submitted by Blogstar on December 29, 2007 - 11:37pm.

"I dont trust any stats I have seen because I haven't seen any which accurately portray what has and is happening."

The stats show the trend for the big picture and can't do any better than that because the housing stock and location value of very few areas encompassed by a zip code, let alone two or three zips, is homogeneous. In the chart Normal Heights is lumped with Mission Valley. These two zips have very little in common. 92116 alone might as well be two or three worlds lumped together. It contains Kensington which resembles a higher tier and other areas that are almost on par with Logan Heights. You could have a house in that zip that is only down 3-5% and another 25%. The average for the two zips is 7%. From a buyer's or a seller's point of view that stat could be meaningless, beyond the obvious that almost all houses are falling to some degree. There are many applicable variations of the same theme for other areas.

Disclaimer: I am not bullish on any market, or even a particular house in the region.

Submitted by sdrealtor on December 29, 2007 - 11:48pm.

Bear in mind that my house did not sell in either time period. If it did, it could have sold above or below the values i used based upon the skill of the agent in pricing the property, the sellers motivation and many other factos. However, the numbers I presented are very solid typical cases of what price levels were and are in my submarket. As it is, esmith's data as well as the case shiller figures dramatically overstate bubble appreciation and understate bust depreciation thus far IMO>

People get too excited over some whizbang statistical analysis. There is just too much noise in the data. I dont trust any data points. Whenever, i look at a comp I always question the price. Sometimes people sold too cheap and sometimes they got lucky with a high price. Sometimes the data is entered improperly or there are incentives undislcosed. Sometimes agents get paid and sometimes there are fsbo sales. Nearly every data point has quirks of some regard. I prefer to dwell in the world of theoretical price levels that existed at a point in time and what they are currently. I find this to be a much more reliable indicator of what is really happening.

Submitted by Eugene on December 30, 2007 - 5:27am.

can you add graphs that chart the rate of change of the particular areas?

any specific areas you want?

also, (maybe you did this already? or case-shiller did?) apply filters that weed out 95, 90, and 75 percentile data?

What do you mean?

Homes like mine were around 500K in late 2000. They hit a peak of around 950K. Realistically, it is probably somewhere between 825 and 850K today which puts it 70% above its 2000 price and down around 13% from the peak

My model says that a house worth 500K in late 2000 hit a peak of around 1M in 2005-2006 and it's still worth around 950K. So the issue is really that your 13% decline from the peak is not reflected in statistical data for the area.

It's not reflected because of transactions like these

http://www.sdlookup.com/Property-7798B5D...
$1.45m in 10/2005, $1.54m in 11/2007

http://www.sdlookup.com/Property-B8C4911...
$1.01m in 4/2005, $1.01m in 11/2007

http://www.sdlookup.com/Property-AAE222C...
$629k in 3/2005, $693k in 11/2007

Maybe your specific neighborhood is different, but it seems that some houses in 92024 and 92009 do sell for 2005 prices and above.

There is just too much noise in the data. I dont trust any data points.

Every single data point is suspicious in the same way. If you have lots of points, underlying trends will start to show up behind the noise.

In the chart Normal Heights is lumped with Mission Valley. These two zips have very little in common

Good observation. I'm actually aware of that. I was trying to cover all zip codes of Greater San Diego. For 92108 I only had a total of 11 resale pairs (it's mostly a condo area) and it didn't naturally fit with any of its neighbors. Coronado is in a similar situation. It's different from OB and Point Loma, and it's too small to estimate its rate of decline with reasonable precision.

Most areas in the chart have at least 40-50 resale pairs in each half-year period, enough to get the rate of decline down to within a few per cent.

Submitted by sdrealtor on December 30, 2007 - 9:47am.

The difference is you are using a model and I am basing my stats on real street level market information. My house was never worth $1m and the 950K would have required a lucky sale at the absolute peak. My neighborhood is very representative of the overall market and if anything has been stronger in the decline.

The problem with using stats looking down from cyberspace is that you use examples like Circuola Sequoia and Swami's Lane which were new purchases. Both required landscaping, window treatments and assorted other improvements to make them liveable which could (and did) easily add 10% to the purchase price you used. The Swami's house was in a new tract close to the beach which had over 1000 people trying to buy about 30 homes. More than half of them went to friends and family of the builder. If they were sold on the open market the prices would have been much higher. I was on the list to buy one myself. These are examples of the kind of noise present in the data.

I think what you tried to do is great. It was a valiant effort but you are trying to do something which quite simply cant be done with any degree of accuracy. Sure trends emerge (prices increased from 2000 to 2005/6 and now they are falling....DUH!) but they can be observed equally with common sense. What has happened and what really is happening cant be accurately determined from cyberspace.

Submitted by drunkle on December 30, 2007 - 11:39am.

esmith:

can you add graphs that chart the rate of change of the particular areas?

any specific areas you want?

also, (maybe you did this already? or case-shiller did?) apply filters that weed out 95, 90, and 75 percentile data?

What do you mean?

no preference for area, figure you could easily do it for all your existing classifications...

filter out the 5th/95th percentile data, 10th/90th, 25th/75th... as in, get rid of the data that is outside of the percent range in the distribution...

i don't recall the exact method of doing so and i dont even recall the proper term. but essentially, lop off the top and bottom set of data that is less than the bottom 5% and greater than the top 95% of the data in the distribution. so for example:

dataset:
15
15
15
45
45
55
65
65
75
95

median = 50, 10% percentile = 17, 90% = 93. eliminate values that fall outside of the percentile range and then recalculate median and plot. for this dataset, median becomes 60...

i'm wondering if doing such would get rid of aberrant values (prices) and affect the median, showing a more accurate picture... or maybe doing such would really only be useful with calculating mean... or maybe it's just a waste of time as the values dont change much...

Submitted by Eugene on January 15, 2008 - 3:37am.

December numbers are in!

San Diego average - 07/12

In the last three months, we've reversed the appreciation of February, March, and April of 2004.

C-S predictions:
November: 209 (-16.5%)
December: 202 (-19.5%)
January: 194 (-22.5%)

At this rate of decline, sometime in 2010 house prices will hit zero.

Regions - 07/12

Submitted by raptorduck on January 15, 2008 - 6:46am.

esmith. Great data. I hope my posts bore you enough to add Rancho Santa Fe (92067) and Santa Luz (92127) to your analysis.

Submitted by WaitingToExhale on January 15, 2008 - 8:05am.

Esmith, this is great data. I also really appreciate your earlier graph showing the overlay between your implementation and the official case-shiller data. I'm a scientist, so I appreciate a nice graph, and confirmatory comparisons. This is really helpful!

Now, if your yellow line would just keep dropping....

Submitted by FormerSanDiegan on January 15, 2008 - 9:03am.

Excellent stuff! IMO, this is the best contributor-generated material on this site in over a year.

Submitted by XBoxBoy on January 15, 2008 - 9:44am.

> At this rate of decline, sometime in 2010 house
> prices will hit zero.

Actually if that's correct, and we assume that houses will not go to zero, then you make a strong argument that the bottom will be sooner than 2010. Very good news IMHO.

Submitted by DWCAP on January 15, 2008 - 9:54am.

XBox,
That actually tells me that we will not be able to sustain this level of reduction, not that the bottom is near. Half that rate of reduction and you are still going to give the NAR a heart attack, but homes will bottom after 2010.
The bottom will be the exact opposit of the top that passed in 2005/6. A year or two where prices dont really do anything, but sales are picking up. (seasonally smoothed, I dont mean any fake bottom spring)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.