@Christian Wathne
>>There are so many problems with relying only on an algorithm to make investing decisions; some of them include
Yes, there are problems. If there were none then I would assume that the problem would be solved by data scientists far better than I.
What I want to discover, for my own, is why the problem been solved. If the answer is that you need to bake in localized assumptions and knowledge into your model and there are many many localities where the knowledge and assumptions change over time... Then I could see why Zillow/Redfin haven't cracked this problem. It's an incredibly difficult problem to solve on a national level - but it is solvable at the city or zip-code level.
>>All of these things, and many more, provide huge impact into market value, and today that data does not exist.
Yes, although I'm not sure about huge. I've seen many notebooks that show very strong correlation between sale price, square foot, basement size etc.. Although there is also a strong correlation between "house condition" and "sale price".
>>Do you really think zillow/redfin/etc aren't working on this problem and working hard at making their estimates better?
I know they are working this problem, they've posted a Kaggle competition with a 1.2mil bounty for those who can help them minimize the log(error) of their estimates. They're providing a 2k sample dataset for the L.A area I believe.
>>ML was not invented yesterday.
I know, ML is a bit of a misnomer. It's mostly regression techniques and a lot of those techniques have been around since the 70's.
>>You're all disillusion if you think you can just plug some off the shelf public data into some off the shelf software and ?>>think you're going to change the world or at the very least gain a competitive advantage.
No need to start making it personal. Let's try to keep the tone friendly and conversational - nothing will get done if the conversation degrades into name calling and tearing each other down.
But no, I don't think you can solve the problem completely by plugging it into some off the shelf software(? To clarify, we're writing the analysis code in R/Python and porting it to C if the idea proves through).
As far as thinking I'm going to change the world - That's beyond the scope of the conversation and now we're getting into hyperbole. My original goal is to answer the question
"Can I write software to predict the price of a home given freely available data?"
You might say, "Well Zillow and Redfin fail and so will you" and my response with that is that I'm completely comfortable with failing because
1.) I will gain insight into the application of data science techniques towards real estate investment
2.) I will make myself more marketable as a software engineer regardless of whether I succeed or fail
>>Try creating an automated system that will fly a drone into a house, analyse it and come up with rehab budgets; you >>could sell that for billions if you could make it work, but good luck
I'm not prepared to make the capital investment needed to make that work. ML towards REI using available data is a side project I can do on nights and weekends - and I'll likely know whether the problem is worth pursuing in < 18 months.
Involving hardware (drones) and image recognition would require capital expenditure and would make the task far too large for a single person to tackle.