Why are you paying so much for that Boston airbnb? A look at the data

Ricardo Rosas
4 min readAug 13, 2019

You already paid your ticket to go to Boston for the long weekend. You check nervously at your bank account… 876$ left. And you still need to pay bills. Oh gosh… you still need to book that airbnb! Where should you stay? Is it so much more expensive to have the flat all for yourself.

Do not despair. I crunched the numbers for you. I followed the CRISP-DM process of data science to extract insights from the data.

Doing the Udacity Data Science Nanodegree , I stumbled upon data gold. Kaggle had a bunch, a big big bunch, of data on the Boston listings. Here’s what I found?

What’s the price for that neighborhood?

Sorry for this geeky-image. Don’t despair, you don’t need to review your stats 101 books to understand this box-plot. Huh? Box what? here’s your wikipedia . So how do you read this?

Look at the left. Find the neighborhood you wanted to stay in. Is all that strange figure more to the right of the red line? That means that the neighborhood it more expensive than your typical airnbn in the area. Remember your bank account? Good luck with getting much bang for your 876$.

So you will need to part with at least 180–350$ / night if you want to stay at the gay village. Not during pride, of course.

What’s moving the price?

You don’t really need me to tell you, right? Short answer, if you want to find something cheaper, go for a poorer neighborhood, in a smaller flat (extra points if you don’t have the flat for yourself). But I promised you data, so here we go.

The element most correlated to price is how many people can be accommodated. No surpise there. Perhaps more surprisingly is the fact that # of reviews is negatively correlated with price as well as the host acceptance rate. In human-speak that means that listings with tons of reviews tend to have lower price and hosts that don’t accept as many guests also usually have lower prices. Buuuut…. remember! Correlation does not imply causation.

Let’s look at driving factors

Oh wooow… that’s a ton of numbers! Don’t worry, I won’t go into the details… What this regression tells you is from all the factors considered, which could have a causal influence on price. Take it all with a grain of salt as the regression only explains 50% of the variance in price.

What does this mean? some examples:

  • For every additional person a flat accommodates, you can expect an increase of 23 USD
  • If review rating goes up 1 percentage point , you can expect an increase of 0.89 USD in the price per night. Is this much?
    Let’s imagine you’re renting a place 100 nights in a year. Your increase your score by 11 points (say, from 0.8 to 0.91) . This means that you can expect to gain 1000 USD more in that same year!
  • Don’t want to share an apartment with others? If you have the property all for yourself, you can expect the price to increase 50 USD

So it seems that with your 876$ on the bank account you won’t go far… but hey.. why not couch-surf?

How did I get to these results? I followed the CRISP-DM process

  • I started with the business understanding. Having stayed multiple times at an airbnb I knew what to look for
  • Data Understanding: I spent a lot of time looking at the data to understand it better
  • Data Preparation: I cleaned the data, removed missing values, one-hot encoding, etc
  • Modeling & Evaluation: I used a linear regression and evaluated the result using the OLS summary table
  • Deploying: I interpreted the model

Check my github repository.

--

--