An accurate view of the Irvine housing market by Global Decision and IHB

Back in March, I wrote about the future of IHB news and real estate analysis. In that post, i made the following observation:

Data is important, isn’t it?

It’s a shame the NAr has gone down the path it has. Few reliable sources of real estate analysis and information exist, and few signs the NAr is going to become one of them. That leaves a void. Uncharted waters buyers must navigate without a reliable guide. It’s a void we seek to fill here at the IHB.

We are in the process of assembling our own private database of housing and related economic statistics. Over the next several weeks as I have time to digest the new information, I plan on a number of new analysis posts to truly illuminate the activity in our local housing market.

I have no agenda to spin the data. Let’s see what is really going on. I want to be accurate. People can make their own decisions and draw their own conclusions from accurate data. If approached without the built-in bias of a realtor, data analysis can be revealing rather than deceiving.

I will still have a dog in this hunt. I do run a business that makes money from real estate transactions. I am subject to the same biases as any other human being. I sell real estate, but I am not a realtor. The truth needs no salesman. I will present data as accurately as I can. If reality motivates you to buy or rent, the IHB can help you. I have no desire to manipulate data in order to make a quick buck. This is a part-time hobby for me, not my livelihood.

After that post aired, I was contacted by Jaysen Gillespie of Global Decision, an analytics and consulting firm that has worked in the real estate industry. He shares my interest in determining what is really going on in the real estate market. As a professional data analyst, he is trained in special techniques I cannot perform.

In the weeks that followed, we have met several times and with the assistance of another data analyst, Brian Nadel, we have performed an in-depth analysis of the Irvine housing market. Today is the first in a series of posts on our findings. Today’s post lays the groundwork for the detail to follow later. The basic model Jaysen developed is complex, and we felt it deserved a post on its own to ensure everyone understands what we did and why it is better than other measures of value currently available.

The following is the writing of Jaysen Gillespie. I have not set it off in block quote to make it easier to read.

A presentation by Jaysen Gillespie of Global Decision

[email protected]

Global Decision is an analytics consulting firm.  While our methods are not industry-specific, our engagements are skewed towards specific industries in Southern California, such as real estate (along with online gaming and restaurant chains).  We specialize in applying both foundational and advanced analytics to better understand business and economic issues.

Today’s post is an example of such an application, known as hedonic housing valuation.  The goal of a hedonic housing valuation model is to use all information about a sale, including both the sale price and the characteristics of the home (number of beds, number of baths, square footage, etc.) to understand how the home’s value is derived from its constituent parts.  Wikipedia offers a good overview of hedonic regression.

Unlike looking at comps, which relies on a small number of highly similar properties, a Hedonic model incorporates as much data as possible from a vast number of properties.  The core mathematical construct behind a hedonic valuation model is a multiple regression, and for such regressions to produce statistically meaningful results, it’s helpful to have a large number of sales as inputs into the model.  In non-technical terms, the regression procedure figures out how to best fit the values of all the pieces of a home to build a formula for the value of a home based on the characteristics of the home.  A simple linear hedonic valuation model might, for example, conclude that each bathroom adds $15,000 of value to a home – or that each square foot of living space adds $250 of value to a home.  Such values are calculated based on actual historical sales and represent the regression algorithm’s best estimate given the data.

For more details on the mathematics behind hedonic regression, along with plusses and minus of using hedonic models for housing valuation, please see the real estate section of the Global Decision website.

All regression methods – and in fact all mathematical models – suffer from one of the same drawbacks:  factors not in the model may impact the dependent variable under study.  In our example, housing values in Irvine, we might find that properties with an exceptional view sell for a premium.  Our housing data does not reflect whether a property has an exceptional view – and our model would likely undervalue that specific home.

As it turns out, the Irvine dataset is the best of most possible worlds.  The city itself is extraordinarily homogeneous:  schools are uniformly good and crime is uniformly low.  There are no “bad” areas, by typical American metropolitan standards.  This homogeneousness allows us to construct a model that has a very strong ability to understand how Irvine homes derive value from their constituent parts – and is not overly swayed by factors not available for modeling.  To further exclude data points that are not representative, we’ve excluded condos, “attached single family” properties.  We’ve also excluded properties with unusual characteristics, such as a 15,000 sq. ft. lot, or 7 bedrooms.  Unusual properties only represent about 2% of the Irvine sample and don’t detract from the model’s ability to trend home values over time.

With the data winnowed down to true resale single family detached houses with no unusual characteristics, we can then run the regression to determine how the value of homes has moved over time.  Our Irvine dataset includes sales from 2000 through June 2011.  The regression model calculates each quarter’s price change, relative to the initial quarter (2000Q1).  Because there are only so many sales each quarter in Irvine, and because regression-based models need a certain number of data points to produce valid results, we are not able to generate a monthly price series of the same quality.  Regression models require more data than some other approaches – but they also provide a deeper understanding of the data in exchange.

The above chart shows the actual median price for resale SFRs from the underlying dataset, along with the hedonic model’s estimate of how prices for those same properties have moved over time.  The key insight is immediately clear:  during the years of rapid appreciation, both the median and the hedonic trend were similar.  However, between 2005 and 2008, the two series started to diverge and are presently at significantly different values.

A second observation is that the hedonic series is much smoother.  The median price can gyrate wildly from quarter to quarter, as evidenced by the 10% drop from 2010Q3 to 2010Q4.  The hedonic model, by contrast, dropped only 3.0% in the same period.   A core benefit of a hedonic approach, versus a median-value approach, is that a hedonic model is not skewed by changes in the mix of product that sell each quarter.  As sales move from larger to smaller homes and back again – or from one neighborhood to another – the median value is pushed and pulled by the changes in the mix of the underlying properties.  Such changes do not indicate the actual home value trend and serve only to obscure the true change in home values in the mid-term.


So what’s driving the gap between the median value and the value implied by the hedonic analysis?  As we mentioned earlier, changes in the mix of properties affect the median but not the true trend of real estate values.  Foremost among such changes is the size of the median home sold.  Clearly, with all else equal, if the homes that are selling increase (decrease) in size then the median value will rise (fall).  For this reason, some analysts prefer the “price per square foot” summary metric.  That metric also produces distortions, though in different ways.

You can see from chart 2 that the median size of homes sold in this particular dataset has risen over time.  The rise over time is a general trend, but it also exhibits a visible discontinuity upwards around 2006Q4.  Starting in that quarter, about 50% of the quarters have median home sizes that exceed 2,300 – a condition which did not occur between 2000Q1 and 2006Q3.  The median home value series is pushed higher by the fact that larger homes are selling.  The degree of the distortion is evident in the chart:  the gap between the undistorted hedonic index and the median-based index is clearly visible from 2007 to present.


The other major factor that drives home values is location (regression models don’t need it repeated three times).  Because Irvine experienced something of a building boom from 2000-2007, the percent of total sales represented by newer homes has also increased over time.  This change in mix is another reason why using the median home value over time to represent the change in the true value of any given Irvine home yields distorted results.

An admittedly-leading question to ponder in the astute observations:  if builders create housing that’s physically identical to the average existing housing stock, but those properties sell for a premium for being new, will using the median home value as a price index generally overstate the actual change in value?  What if builders create housing with more (or less) favorable characteristics over time?

Using the Hedonic Model to Predict Future Prices

An interesting offshoot of the hedonic model is that one can use its relatively-stable quarter-over-quarter values to better understand whether the current price trend is deviating from historical norms.  Home values have a seasonal component to them.  Most major indexes, such as the Case-Shiller, offer “seasonally adjusted (SA)” and “non-seasonally adjusted (NSA)” series for this reason.  The Irvine hedonic model is inherently not seasonally adjusted.  In fact, one can use the results generated by the model to help understand the seasonality of price changes.  The following graph shows the average change, quarter-over-quarter, in the Irvine hedonic price trend series.

The hedonic model provides solid evidence that prices are generally stronger in Q2, with weakness in Q4.  This finding simply confirms conventional wisdom.  However, if we couple this fact with the fact that 2011’s hedonic analysis shows a flat trend in Q2 2011, then de-seasonalized trend of Irvine home values in Q2 2011 was negative.  Q3, on average, is about 1.8% worse than Q2 – and a typical Q4 is over 4% worse than a typical Q2.  Only time (and actual data) will tell if Q3/Q4 2011 will follow these trends, but the implication is that Irvine home values could easily fall 5% in the coming quarters due to seasonal factors alone.  If the underlying trend is actually negative, the drop will be exacerbated by seasonality.  If the underlying trend is actually positive, the gains will appear muted in Q3/Q4 for the same reason.  The Q3 and Q4 editions of the hedonic model will explain which scenario is the case.  Stay tuned!

The Irvine hedonic housing model does not directly attempt to predict future home values.  It does, however, more clearly show the true underlying value trends based on actual sales.  Those true underlying trends can then be combined with other sets of data, such as default/foreclosure rates, loan-to-value ratios, job growth, and so forth to create model-based approaches to predict future home values.  In that sense, if others can use the hedonic approach to refine their forecasting models, it does have predictive value.

IrvineRenter’s commentary

When we first poured over the results, I was struck by how the model more accurately showed the decline in value since the peak without the distortion of product mix. I have noted on other occasions that the few transactions occurring in Irvine have been of the most desirable single-family detached homes, and in order to complete the transactions, buyers have put more money down.

The hedonic model shows both the increasing size of homes purchased as well those homes being the newer ones. As we will show in future posts, many of these sales have been in Turtle Ridge and Quail Hill were the premiums are astronomical. The flight to quality from cash-heavy buyers is apparent.

Armed with this new model, we dove into the details on various neighborhoods and even the components of the housing stock itself: beds, baths, square footage, age, garages. We also looked at condos and rentals. The results will be detailed in future posts.