Skip to content
×
Try PRO Free Today!
BiggerPockets Pro offers you a comprehensive suite of tools and resources
Market and Deal Finder Tools
Deal Analysis Calculators
Property Management Software
Exclusive discounts to Home Depot, RentRedi, and more
$0
7 days free
$828/yr or $69/mo when billed monthly.
$390/yr or $32.5/mo when billed annually.
7 days free. Cancel anytime.
Already a Pro Member? Sign in here

Join Over 3 Million Real Estate Investors

Create a free BiggerPockets account to comment, participate, and connect with over 3 million real estate investors.
Use your real name
By signing up, you indicate that you agree to the BiggerPockets Terms & Conditions.
The community here is like my own little personal real estate army that I can depend upon to help me through ANY problems I come across.
Real Estate Technology
All Forum Categories
Followed Discussions
Followed Categories
Followed People
Followed Locations
Market News & Data
General Info
Real Estate Strategies
Landlording & Rental Properties
Real Estate Professionals
Financial, Tax, & Legal
Real Estate Classifieds
Reviews & Feedback

Updated over 8 years ago on . Most recent reply

User Stats

17
Posts
6
Votes
Aleks Petrov
  • Belmont, CA
6
Votes |
17
Posts

Creating county Web Scrapper

Aleks Petrov
  • Belmont, CA
Posted

Hi Biggerpockets community!

My name is Aleksei, and I’m planning :) to invest in RE.

I live in expensive area, so my steps in RE should be very careful. I’m active listener of BP video podcasts and it is excellent source of information.

While I’m learning theory, I thought what I can do with my knowledge as software developer?

I know that all data about properties is public accessible, but you cannot do search using particular filters. So I decided to write my own web scrapper and create own data base. 

Picked 1 county for pilot project, and in this topic I’ll post updates about challenges in web scrapping.

Please let me know if it is going to be interesting topic, so I’ll keep posting updates.

Thanks BiggerPockets for being so awesome resource!

Best regards, Aleksei.

Most Popular Reply

User Stats

17
Posts
6
Votes
Aleks Petrov
  • Belmont, CA
6
Votes |
17
Posts
Aleks Petrov
  • Belmont, CA
Replied

@Trevor Ewen thanks for reply!
it is good question 'what then'...
I'm planning to create full copy of DB, so I can do any query per demand.

Goal #1: get all available APN’s numbers for given county.

Goal #2: get all data on available APN’s and save on local DB

Part1:

Web site has form with 1 billion available inputs, so need to iterate and find correct numbers.

I wrote script that using web browser to input numbers and submit search. Results I decided to store in CSV file.

1 browser executing script in 2 seconds, I was able to run in 16 browsers total (2 machines x 8 browsers).

2 seconds * 1 billion / 16 browsers = 125000000 seconds = 1446 days which is not acceptable.

Next solution is to use API requests to omit browsers.

In this case I can run 1 request/response in 0.2 seconds and can execute ~10 parallel executions:

0.2 second * 1 billion / 10 = 20000000 seconds = 231 days.. much better than previous result, but still slow.

Right now I don’t have better solution..

I’ve noticed that I can search APN’s on 2 (at least) different web sites, my next step is to check if this sites using different DB (or copies) and I can double my speed by hitting 2 points.

Will keep you posted.

Loading replies...