Jul062011

A detailed look at Irvine Village premiums by Global Decision and IHB

Which Irvine Village is the most desirable for single family homes?

Which Irvine Village is the best, and how could this be determined? Well, taking an opinion poll might be interesting, but it wouldn’t be backed by anything substantial. To determine what people really believe about desirability, we have to look where people put their money. Money talks. The neighborhoods where people pay the most for real estate determines what is “best.”

Determining which neighborhood obtains the highest premiums is not easy. We couldn’t simple look to the MLS or to past sales and see where the prices are highest because there can be many reasons people pay more in one neighborhood versus another. To accurately measure premium, we needed to normalize for other factors to distill a premium value not explainable by other factors.

In the first post in this series, An accurate view of the Irvine housing market by Global Decision and IHB, I introduced Jaysen Gilespie of Global Decision. “Jaysen Gillespie of Global Decision, an analytics and consulting firm that has worked in the real estate industry. He shares my interest in determining what is really going on in the real estate market. As a professional data analyst, he is trained in special techniques I cannot perform.” The first post was well recieved. Jaysen’s skills with data analysis are remarkable, and I am thrilled he is working with me and the IHB to bring this information to the readership.

The following is the writing of Jaysen Gillespie. I have not set it off in block quote to make it easier to read.

A presentation by Jaysen Gillespie of Global Decision

[email protected]

Global Decision is an analytics consulting firm.  While our methods are not industry-specific, our engagements are skewed towards specific industries in Southern California, such as real estate (along with online gaming and restaurant chains).  We specialize in applying both foundational and advanced analytics to better understand business and economic issues.

Today’s post is part two of our series on hedonic housing valuation in Irvine.  The goal of a hedonic housing valuation model is to use all information about a sale, including both the sale price and the characteristics of the home (number of beds, number of baths, square footage, etc.) to understand how the home’s value is derived from its constituent parts.  Wikipedia offers a good overview of hedonic regression or see the Global Decision tutorial on how to build your own hedonic regression model.

What is a mathematical model?

A mathematical model is an abstraction of a real-world situation.  Models help us understand how complex systems work by distilling them down to a manageable number of inputs, and they provide the ability to tinker with the model – and see how the system responds.  In our case, the “system” under consideration is the Irvine, CA housing market.  The hedonic model helps us do two things:  first, it deepens our understanding of the drivers of housing demand.  Second, it allows us to play with hypothetical scenarios and see what the impact would be on the value of a property.

How would someone in the “real world” use a hedonic housing model?

A fun aspect of building and using mathematical models is that you can perform experiments that would be physically or financially impossible in the real world.   Because “neighborhood” or “area” is an input into the hedonic housing model, we could theoretically pick up a house from Woodbury and move it to Woodbridge (keeping all else equal) and see how the value of the property changes.  For those engaged in building new homes, or developing a plan for a new neighborhood, an accurate hedonic housing model can be used to optimize revenue.  If you know the incremental revenue you can obtain from an extra bedroom, an extra bathroom, or an extra 1000 sq. ft. of lot size, you can compare the costs of each option with the resulting expected increase in valuation and find the best bang-for-the-buck.  It’s not an exact science, and I wouldn’t execute blindly based on just the results of a model.  But a well-structured model can provide valuable insight and an unbiased view into the marketplace’s preferences.

How are neighborhoods modeled in the Irvine Hedonic Housing Model?

With Irvine it’s really a case of “the hits just keep on coming.”  Not only do we have a background of glasslike consistency, especially in terms of education and safety, but we also have large, well-defined neighborhoods.  Some locals refer to them as villages, though they have no governmental or political authority.   Each village is similar, in that it was constructed in the same range of years, has access to many shared facilities, and generally sports a consistent look-and-feel throughout.

In regression models, there are generally two types of explanatory factors – continuous and categorical (discrete).  Continuous factors can take on any value, or perhaps any integer value.  Examples include the age of a property, the square footage, and the lot size.  Categorical factors typically have either well-defined and finite possibilities or have a practical limit on their range.  In theory the number of rooms in a home is continuous.  You could construct a home with 700 of them.  Unless you have the resources of Louis XIV, it’s probably not going to happen.  So we’d consider the number of rooms in a home to be categorical for practical reasons.

If a categorical variable is numeric in nature, we can – at our option – use that number directly as an input into the underlying regression model.  Treating a categorical numeric variable as a continuous variable makes sense when incrementing the factor by one has about the same impact.  In our model, we treat the number of bedrooms and bathrooms as continuous, even though there is a finite range for these values.

An area designation, by contrast, can’t be directly fed into a regression model.  It’s not numeric, and the model has no conceptual understanding of “Woodbridge” vs. “Oak Creek.”  Fortunately, there are well-developed methods for handling these types of variables.  The crux of the solution requires us to do two things:  first, we must pick a baseline level for each category.  In our example, we use “Northwood-Old” as our baseline area.  After the regression is run, the baseline becomes the reference level against which others are measured.  A good baseline contains a lot of data points and is preferably an area of average value.

Once we have a baseline level (Northwood-Old), we can model all other levels as extra variables in the regression.  Continuing our example, we’d set up a variable called “Woodbridge.”  Homes in Woodbridge get a “1” for that variable; others get a “0.”  We benefit here from Irvine’s village system.  Because villages are quite large, we need introduce only a small number of extra variables (16 in our case) into the regression model.

So what will the hedonic housing model tell me about neighborhoods?

The output of the hedonic model will tell the analyst how much more (or less) a home would be worth if that home were moved from the baseline area (Northwood-Old) to the area designated by each area variable.  The virtual move from Northwood-Old to another neighborhood assumes all else is held constant.  So if you start with a 3/2, 1600 sq.ft. home from 1977 in Northwood-Old and move it to Woodbridge, you’d gain (or lose) the dollar amount stipulated by the Woodbridge variable’s coefficient.

The model assumes all else holds constant.  In reality, the model can only hold constant the factors that are directly input into the model.  So if you use the model to move a home from Turtle Rock down to University Park, and you lose a city lights view in doing so, the model will revalue the home lower – but miss the fact that the view disappeared.  Views, backing up to Culver, having Metrolink as your neighbor, or owning a strangely shaped lot are all examples of factors not in the model.  In the future, it may be possible to add this type of data into the Irvine Hedonic Housing Model.  A parcel map and GIS system would be able to determine if the property is located next to a major negative (I-405, for example).

The above chart represents the results of the Irvine Hedonic Housing Model, run on 2007-2011 data (so that all neighborhoods could have sales in that time frame).

Important but statistical side note:  regression methods provide a best estimate of the impact of each area on home values.  For the above analysis, our margin of error for each estimate varies by neighborhood.  This “standard error” ranges from 0.7% to 1.9%.  Areas with standard error greater than 1.4% are shown in lighter blue.  There is a 68% chance that the true impact of the area on market value falls within 1 standard error of the quoted best estimate, rising to 95% when the band is expanded to 2 standard errors.  For this reason, we might say that Portola Springs and Northpark are statistically similar in their impact on value, but we are more sure that Columbus Grove (CG) has the lowest incremental market value.  Even if the true CG value was 3.8% greater (2 standard errors), CG would still create a decline of 6.1% in market value.  The 6.1% might put it in contention only with Orangetree and West Irvine in a statistical analysis.  The standard error decreases as more data is accrued, so new neighborhoods are more subject to statistical swing.

The premiums for each area, relative to Northwood-Old, are listed on the chart.  These values are not a human judgment of any type, and are derived directly from the relationship between the physical area of the home, the other factors in the regression model (beds, baths, sqft, lotsize, etc.) and the selling price of the home.  The above result indicates how the market perceives each neighborhood in terms of valuation.

The general ordering of the areas should be of no surprise to area residents.  Turtle Ridge, Turtle Rock, and Quail Hill stand out as having high incremental market value.  Woodbridge is also a very solid performer and leads the pack of the older vintage flatland areas.  The El Camino Real / Walnut complex lacks an association in some areas and is of an older design, so one might speculate that those factors lead to its lower valuation.  It’s important to note that a property-based hedonic model does not tell you *why* each neighborhood is valued how it is – unless you have factors in the model that theorize the “why.”

It’s up to the analyst to consider what might be the underlying root causes.  Models are a tool; some domain-specific knowledge is helpful in leveraging and interpreting their results.  Some theories are testable by adding additional variables to the underlying regression.  If, for example, we theorize that El Camino Real loses value because there is not enough park space, we could add in a variable with the number of square feet of parkland per housing unit.

Columbus Grove is a particularly interesting example.  It’s located on the fringe of Irvine, but within the bounds of the Irvine school district and enjoys all the other benefits of being in Irvine (i.e. safe, near jobs, climate, etc.).  However, the properties appear to sell for quite a discount to even an average area.  Such a result shows that newer is not sufficient to generate enhanced market value.

In the above chart, properties sold in Columbus Grove are in blue, with Woodbury in red.

It’s easy to see that properties in Columbus Grove (CGR), at the same size as those in Woodbury (WDB), sell for considerably less.  These 2-dimensional scatter plots are quick-and-easy tools to help verify that model results are sensible.  Given that CGR and WDB are both newer neighborhoods – and that the regression takes lot size and bed/bath configuration into account, it’s interesting that the market has valued Columbus Grove so much lower.

From the Case-Shiller Tiered indexes, we already know that area (for which price is a proxy) already plays a large role in determining how home values have performed after the housing bubble’s peak.  We’ve overlaid the Global Decision Irvine Hedonic Home Price Index on top of the Case-Shiller LAOC Tiered Value Indexes in the above graph.  While Case-Shiller’s Aggregate value is down almost 40%, there is a clear distinction between lower-end and higher-end results.  Case-Shiller’s High Tier is down only 30% from peak pricing.  Irvine SFRs are performing even better, with 15-20% declines since the Irvine peak in early 2006.

For the areas of Irvine that have more data, we can take a stab at computing a hedonic price index for just those specific villages.  We gain a finer level of granularity from doing so, but we lose statistical accuracy.  In the overall Irvine Hedonic Housing Model, a typical standard error for the price trend numbers is near 1%.  When we go area-by-area, those errors range from 1.5% to near 5%.

While all areas exhibit the same rapid rise, decent, and flattening trends, a few edge cases are worth a mention.  First, 5 of the 6 areas have the same overall increase from Jan 2000 to early 2006, about 140-150%.  Turtle Rock, however, is different.  Its peak value hits “only” 120% above the Jan 200 value.  Even more interesting, is that Turtle Rock did not experience nearly as much of a decline-from-peak as the other areas.  We don’t have enough data to do a meaningful hedonic price trend model for the other top-3 value add areas (Quail Hill and Turtle Ridge), but we know from Case-Shiller’s Tiered metrics that higher-end properties have held value better post-bubble.  Irvine is, itself, the higher-end of Case-Shiller’s Top Tier.  The decline in Irvine home values has averaged 15-20% vs. 30% for the LAOC Case-Shiller Top Tier.  Within Irvine, a high value area such as Turtle Rock appears to be experiencing even smaller declines.

Conversely, the area which rose the most in value (as a percentage of Jan 2000 values), is El Camino Real.  Interestingly, El Camino Real’s values have now dropped the most of any neighborhood in Irvine (in this analysis) after the bubble popped.  Again, the model cannot explain why – it could be a higher percentage of subprime loans, lower down payments, a change in consumer preferences, etc.

Most areas, including Irvine as a whole, are now about 100% above the Jan 2000 prices.  Over 11.5 years, that’s a CAGR of 5.9%.  That’s a useful number to have, as it can help inform the debate about the future direction of home prices.  We can compare that 5.9% growth rate to other drivers of home value – average wage, job counts, new supply, persons per household, total households – to discuss whether current values represent a post bubble bottom or a landing on a stairway where another drop is forthcoming.

IrvineRenter’s Commentary

I was not surprised to see Turtle Ridge and Turtle Rock at the top of the list, but the size of the premium was shocking to me. The same sticks and bricks are worth 40% more in these villages. Perhaps the premium views account for some of this (which also explains Quail Hill), but there are many non-view homes in these neighborhoods also obtaining substantial premiums.

I was also surprised to see Northwood Pointe and surrounding areas did not receive a higher premium. I would have guessed that Northwood Pointe was on par with Turtle Rock and Turtle Ridge, but it isn’t. I was also surprized that Woodbury did not obtain a higher premium, that Woodbridge is more desirable than Westpark, and that University Park is more desirable than the old Northwood.

In my opinion, Columbus Grove represents the best value in Irvine. It feeds to the Irvine school district, the houses are nearly new, and yet it trades at a discount to the least desirable communities in Irvine. I imagine the Irvine Company would like to have everyone believe it is due to their superior land planning and community marketing. IMO, it’s largely due to the fact that Lennar finished off the community and sold houses at a discount while the Irvine Company stopped construction to keep prices up. I believe Columbus Grove will rise in value relative to the less desirable Irvine communities of Walnut, El Camino Real, Orangetree and West Irvine.

The hedonic model showed than many of the undesirable areas exhibited the most volatile house prices. As mentioned above Turtle Rock didn’t go up as much as other neighborhoods and didn’t crash as hard either. On the other extreme is El Camino Real that went up a great deal and crashed more than other areas. This same phenomenon shows up in the general price tiers of Case-Shiller with the lowest price tier being the most volatile.

IMO, this volatility was largely the result of subprime lending and Option ARM financing. As lending standards were lowered during the bubble, more and more people qualifed to obtain loans. The fringes of the market (i.e. the lowest tier) should be the biggest beneficiary of an influx of new buyers. Turtle Rock wasn’t being bid up by the 580 FICO score mob, El Camino Real was. Couple the influx of new buyers with the extreme leverage of Option ARMs, and the low end of the market gets pushed up substantially. The rest of the market is impacted by the move-ups with diffusion lessening the impact as you go up the housing ladder.

One of the factors that can never be modeled is human emotion and the variability of negotiation. For this reason, I don’t believe it’s possible to construct a model that can vary less than 5% from what the market actually does. Sometimes either the buyer or the seller is represented by a good agent who helps their client keep their emotions under control to make reasonable decisions. Sometimes not. Often either the buyer or the seller has motivations to complete the sale that have nothing to do with the real estate. Buyers can fall in love with a property and over bid, and sellers may need to move and lower their asking price aggresively to sell. People’s emotions and negotiating skills will always represent a variable that can never be accurately modeled.

I want to thank Jaysen for this post. Next week, he will be back with a look at square footage, beds, baths, lot size, and other factors the strongly influence the prices of homes. Stay tuned.