Real Estate Technology
Market News & Data
General Info
Real Estate Strategies

Landlording & Rental Properties
Real Estate Professionals
Financial, Tax, & Legal


Real Estate Classifieds
Reviews & Feedback
Updated over 3 years ago on . Most recent reply

Machine learning and Real Estate Investing
Big question - why hasn't anyone applied data science and machine learning in the real estate domain?
With the recent (7 years?) advances in natural language processing, image recognition and the validation of various machine learning models why haven't savy investors started a mad race towards developing the ultimate valuation tool?
I've been dabbling in tackling this problem - and so far, it's not that bad.
The only big problems I see is that the data sources for national markets are not uniform.
That the given data might be incomplete or even contain errors (which will affect your model).
That the sample set might be orders of magnitude smaller than your feature space (fixable via removing or combining linearly correlated features to yield an orthogonal feature set)
...So why hasn't anyone done this yet?
Most Popular Reply

Hi everybody!
I have used machine learning (ML) to build models that estimate values/prices for apartments and to predict the expected gross monthly rental income. I did this for my local market (Ålesund) here in Western Norway, but also for Oslo. I also focused on only apartments because that was the most relevant to me. (And no one rents houses in Norway, everybody buys.)
There is a ton to talk about here, so I'll just dive right into it and try to give a somewhat concise overview. I hope this will spark an interesting conversation ;)
So, the biggest challenge has been data volume. It took me a long time to collect over 200 samples in my local market, which is part of the reason I resorted to Oslo (Norway's capital) where I have almost 2.500 samples. Still not a ton by ML standards, but useful.
I used a linear regression model (LR) and a neural network (NN) to model both values/prices as well as gross rental income. For Oslo, I am now down to 6% median error rate (with the NN), and 90% of the test samples are within 18%-19% error rates at the worst. And this is by using only four features (input variables) for the model: living area, year built, and location (longitude and latitude). I am personally quite happy with that. The reason I only use those four is because others (I have a total of 22 features collected, many very sparse) didn't improve my models or made its performance even deteriorate.
I also use principal component analysis (PCA) as a very simple anomaly detection algorithm on the data set. I do this to help me automatically identify properties on the market which may be undervalued.
I know that Zillow's zestimate gets butchered left, right and center. But truth be told, 6% median error rate (also Zillow's national average) is quite good, and probably better than a lot of humans. On average, that is. Sure, it can be off by 20%. But humans can be too sometimes. At least I can. I would also like to point out that 6% median error rate means that, on average, all the stuff that is NOT captured by the models (living area, location, year built) only accounts for 6% of the price.
But still this means that I can't blindly rely on just the models. I use them to help me identify and decide on potential deals. And I can happily report that I just bought my first investment property (a studio apartment right in the city center) using these methods. Personally, I think this is pretty remarkable, because before I started with any of these in March I had NO clue about the local real estate market.
So, what's on the horizon? I have a lot of things I would love to try out. I would love to use image recognition methods to look at pictures of listings and detect things in them that influence the price (a nice kitchen, a pool, a pet elephant, etc.). It would also be great to use text recognition to do something similar. I also want to collect data on more markets to extend my investment horizon (Oslo is useless to me because it is way too hot and expensive, but Trondheim could be good).
Anyway, I would also definitely be up for a dedicated BP ML/AI group and to work together to improve our efforts and models! And I am more than willing to share more details. I'd like to point out, though, that I found ML and data science back in February, so I am far from an expert ;)