@Robert Matelski and @Jad Boudiab
I think there is a lot to unpack here. For starters, really appreciate the kind words Robert!
I have to say thought, I think Jad brought up a great point:
"If we were to shift some of the neighborhood borders slightly, we would suddenly come up with different grades because one or two comps would have dropped or were added, making a big difference in a compact market."
The above quote by Jad essentially defines one of the fundamental problems that has plagued geo-statistics for decades. It's called the Modifiable areal unit problem.
In short it has to do with how when one samples a geography and then resamples it after having redrawn the boundaries, the results have a tendency to change.
Here is the wikipedia page for more information and better explanation: Modifiable areal unit problem
Robert, on your website you also touched on this phenomenon on your methodologies page under the header "Special Considerations for B & C Grade Neighborhoods".
You wrote:
"It is important to note that even within a census tract there can be considerable variation that could skew the grade for the specific location of a property. For instance, if a census tract contains blocks of single family houses, but also a large number of senior citizen rental housing apartment communities, the parameters related to income and owner occupancy rate might score far lower than they would have if the senior citizen housing complexes were excluded, thus bringing down the overall grade for the area."
This issue will occur no matter what you do unfortunately. A few ideas I had to lessen the effect this phenomenon has are to use block level data or to utilize untabulated records about individual people or housing units that the American Community Survey collects.
The issues with both these approaches are that block data is only issued by the conventional census, which will reduce your temporal resolution from 1 year to 10. You can interpolate data from the ACS down to block level using geo-spatial algorithms, but I have found this method to be rather ineffective. Untabulated records that contain lat, long coordinates are only available to accredited research institutions. Individuals and companies can only use what is called Public Use Microdata Sample data. They anonymize the data by tabulating each record into a micro use area, which is larger than a Census tract, making this method redundant.
That being said, we also have to define what we mean by "location grade".
The way I have seen it and how I wrote my system, is that the location grade is really a kind of risk score.
I like to use the analogy of it's like a FICO score to a lender, or Bond rating to a bond investor.
The metrics that I have incorporated into my are as follows:
- Median Home Value
- Median Rent
- Median Household Income
- Poverty Rate
- SNAPS (Food Stamps) Rate
- College Graduation Rate
- Vacancy Rate
I have chosen these because I believe that these metrics give the best indication as to the "risks" an investor will face when selecting a location to allocate into. Risks such as non-payment, general vacancy, theft, drugs, vandalism, lease-up duration, as well as market risks such as price reductions or falling rents.
If we only use comps to assess locations, then what do the grades represent?
Admittedly, my scores are not always 100% on the dot either. I've been tinkering with the algorithm the past few months and found better results when I add the delta values for the metrics I am currently using to the algorithm.
As with building any model, iterating on the design and adding more data into the mix to better paint a picture of the world is key.