All Forum Posts by: Severin Sadjina

@Jason Ling - This is definitely a sweet spot "I would then turn my efforts on trying to predict when a homeowner is willing to sell their property in an effort to try to seek out off-market deals before anyone else is aware of them."

It reminds of the case where Target knew someone was pregnant based on seemingly unrelated purchases...

That's a very interesting idea! Would be hard to get a hold of the neccessary data.

0 Votes

@Christian Wathne

>> Data which exists today does not provide the full story of price; for example

nowhere on MLS data for a house does it show
- how the neighbors maintain their home or yard
- whether the layout works for what buyers are looking for
- cost to rehab (this is massive)
- how tall the ceilings are
- current condition

As a Norwegian investor, I have no clue about the MLS, granted. But most of the points you list could, in principle, be captured through image recognition. I am not saying we are there yet; But by using Deep Learning to also look at images (the ones coming with the listing, Google Street View, satellite images, ...), you could capture the effects of how the neighborhood maintains its properties and yards, how tall the ceilings are, what the condition is like, whether the toilet is golden and covered in sapphires, etc.

Assuming you have that data, of course. What I am saying is, that these things are in principle possible with ML/AI. I am not saying they are easy to implement. But I'd say the possibilities are clearly there!

>> I think you're underestimating the power of the human brain. When we look at a house we're not just attempting math on "a small number of data points"; we're looking at millions when you factor in the design / feel / smell / emotions produced / etc / etc.

I think there's a misunderstanding here: What I meant is that the human brain sucks compared to algorithms when you look at thousands and maybe millions of numbers and trying to identify the underlying mathematical patterns. Anything but simple constant or linear relationships between a very small number of numeric samples is too much for the human brain.

Take, for example, a typical CMA: selecting a few comps, same neighborhood, very similar houses, very similar layouts and conditions etc. We can deal with that. But a (good) algorithm that has captured the effects of how location, number of bedrooms and baths, built year, ... influence the price could draw from a much larger number of samples (properties) in order to come up with an estimate. And again, I'm talking about numbers and mathematical functions here.

I am well aware of the awesomeness of the human brain in almost all other areas, and that it can do things that are fully out of the reach of (today's) algorithms, at least given today's availability of data. At least for now, the best results will surely be when we can use ML/AI together with our human strengths.

One last point: AI has in some notable areas surpassed human performance. It has surpassed the performance of human experts in some cases, for example at playing games (Go, Chess, video games, ...). And in other areas it has at least surpassed average human performance, such as in hand written digit recognition and image categorization tasks. Or probably even car accident avoidance.

2 Votes

Elbert Dockery What can happen, of course, is that the "deals" are the ones that Zillow gets most wrong. Yet, those are probably the ones we'll look at. But this is actually how I use ML methods to identify deals! For example: asking price is 129.000, yet model 1 says 148.000 and model 2 says 154.000. Now I go check out the listing in more detail. If it still looks good, I'll go and have a look at it. These are actual numbers for a flat I ended up bidding on. Didn't get it though.

0 Votes

@Elbert D.

>> I think there is a ton of potential for software to disrupt some things in real estate. Even the smart home concepts are still in it's very early stages. I would think whoever can solve some major problems within real estate via software will end up with a ton of money.

I agree a 100%!

>> Zilllow estimates are actually 95% off either overestimation or underestimations.

I am not sure I follow. I mean, let's say your model predicts a price 1 Cent over the actual price. For a 500.000$ house. Would you still count that as an overestimate?

Zillow cites a 5% median error rate (I believe nation wide), which means that half of all prices are off by only 5% or less (which is very good!), and the other half is off by more than 5%. Zillow's estimate seems pretty good for Phoenix where two thirds of all properties are predicted correctly within 5% (see https://www.zillow.com/zestimate/#acc). I honestly doubt that the average appraiser or agent can do better!

1 Vote

@Jason Ling

>> My hope is to replace human intuition and bias with something more concrete. Right now we classify different residential assets based on a criteria of use and size but perhaps clustering will reveal classifications that people have not realized!

Exactly, that is one of the very cool things about ML! But I think for starters you can assume that, for example, condos will need different modelling than multi famiy units. I have also started thinking about even having different models for the low end, mid range, and high end parts of any given market. Especially the high end part often shows totally skewed prices! It seems to me that people basically pay double for the extra standard and views etc.

>> I interpret this to mean that roughly half of my estimates over estimate the price by more than 5% and the other half do better than 5%. How much I over-estimate by 5% is hinted at by the standard deviation of 20%.

...I do not think you can do it that way, but I fail to tell you why exactly right now. But let's say the median value of your relative error (as you compute it) is 5%. Now, what if the standard deviation on the list of errors gives you 0%? What would that mean?

Rather look at residuals and normalized residuals and make sure there is no systematic bias (there will definitely be with that simple model).

>> As I understand it ensemble is to use multiple weak learners in a network to yield a strong model. I myself do not know much about it and I have some studying to do.

Same here. I would guess that this is a great approach for our problem!

>> Also did you write your own algorithms or did you end up using pre-canned ones? e.g sci-learn? I'm probably going to be using Octave to develop my approach, rewrite in Python for better performance.

I am actually also using Octave (because of the Coursera ML course I started with). I feel that, as a theoretical physicist, I need to really understand what's going on. Hence, the low level algorithm design with Octave. But switching to Python and Tensorflow has been on my todo list for quite a while now. Especially with deep learning, it'll make things so much easier and faster!

0 Votes

For anyone wanting to get some insight into what neural networks do and how feature designing/constructing works, definitely have a look at this: http://playground.tensorflow.org

Very well done and fun to play around with ;)

2 Votes

@Jason Ling I'll try to answer your questions:

1.) 10k samples sound pretty good! Is that only one type of real estate (SFH...?), or several? I have only looked at apartments, but I am pretty certain that other types of housing are sufficiently different to require using different models (or, at least, a more complicated one). But I'd maybe start with one area and one type anyhow. Better to go for the lowest hanging fruit first ;)

2.) What do you mean by "the median error was 5% but the standard deviation was 20%"?

3.) I didn't stop using linear regression, I still use both (LR and NN). Is use both because sometimes one is better, and because I can double check and/or average.

4.) I don't know much about ensemble methods, and that's the only reason I haven't tried them yet. ...although I may already be doing something similar: I typically average the results of 30 or so NNs (same architecture but initialized with different weights) to give me a prediction. I also played around with averaging over different architectures and different parameters. But I haven't done enough testing really, and my implementations are probably a bit sloppy.

5.) Linear regression does imply that there is a linear relation between a feature and the output. However, you can construct your own features from the ones you have available (such as living area, location, ...), create nonlinear combinations etc. For example, I use the logarithm of the living area (because it follows a log distribution approximately), I may use the square root of the built year, and I use a few combinations of longitude and latitude including cosine, sine, and the Euclidean distance from the city center. As of today, I use a total of 10 constructed features from the four "original" ones (living area, year built, and coordinates). I found these through simply trying and/or by doing some statistical analysis on the raw sample data. I also always use the logarithm of the price as the label/output, because it too follows a log distribution approximately. If you haven't tried that yet, definitely do!

6.) The nice thing about NNs is that they are "automatic feature constructors" (they were in fact invented for that purpose). This means that they themselves learn which features and feature combinations are the most important, and how to use them. This doesn't always work flawlessly of course and you still need to know what you're doing (to choose the right dimensions and parameters etc.), but it's really pretty awesome! The NN architecture I use now has four direct inputs (log of living area, year built, coordinates), one hidden layer with 11 neurons, and a second hidden layer with 5 neurons. And a bit of regularization to make it generalize better. Again, I found this simply by playing around.

7.) And yes, I think some automatic clustering would be great to implement. It would probably reveal some cool insights, it would help deciding on which model to use for prediction, and it could also help with anomaly detection (for data cleaning and/or finding potential deals).

1 Vote

Originally posted by :

The catch-22 is that those are easy neighborhood to comp out in the first place so any algorithm is going to be less useful.

I'll have to disagree: a human can a.) only deal with a small number of data points and b.) only fathom very simple relationships between them (at least when it comes to the numbers).

An algorithm, on the other hand can learn complicated mathematical relationships (also nonlinear ones that are especially hard for humans to deal with) between all the variables and the price, can do so unbiased (theoretically), and take into account an enormous number of data points. For example, if your algorithm has learned exactly how prices change with location, there is no need to be restricted to only comps in the neighborhood. Same goes for built year etc.

I do fully agree with everything else you've written, though!

2 Votes

Oh, I forgot one thing I find very interesting personally:

As I mentioned, I have very little data on my local market (about 250 samples). That is really very little data by ML standards. I have experimented a bit with transfer learning, where I would train a NN on the much larger Oslo market (2.500 samples) and use part of the "pre-trained" network to look at the data from the local market. The reasoning is that, while prices and geography and construction years are obviously totally different, the NN may still learn interesting features and correlations which should also be useful elsewhere. So far, the testing has not been very conclusive, but I also didn't manage to implement everything 100% correct.

So the point is: one could maybe pre-train on, say, ALL SFH in the entire US market, giving a huge data set, and then use that knowledge to further "specialize" on the local markets. I wouldn't be surprised if that is already part of what Zillow is doing by the way...

2 Votes

All Forum Posts by: Severin Sadjina

Severin Sadjina has started 3 posts and replied 46 times.

Post: Machine learning and Real Estate Investing

Post: Machine learning and Real Estate Investing

Post: Machine learning and Real Estate Investing

Post: Machine learning and Real Estate Investing

Post: Machine learning and Real Estate Investing

Post: Machine learning and Real Estate Investing

Post: Machine learning and Real Estate Investing

Post: Machine learning and Real Estate Investing

Post: Machine learning and Real Estate Investing

Post: Machine learning and Real Estate Investing