Predict House Prices with Linear Regression with Python. Data science and data analytics can be used to help real estate professionals gain insights into the real estate market and make more informed decisions. Data science can help identify trends in the market, analyze consumer behaviour, and predict future market movements. Data analytics can be used to gain insights into pricing trends, identify potential investments, and manage risk.
Data science and data analytics can also be used to optimize marketing strategies, improve customer service, and better manage customer relationships. Additionally, data science and data analytics can be used to identify and analyze patterns in the real estate industry, identify new opportunities, and improve operational efficiency.
In this article, we will be working on a dataset consisting of information about the house’s location, price, and other aspects of buying the house. We will learn how to make a model that can give us a good prediction of the price of the house based on different variables. We are going to use Linear Regression for this dataset and see if it gives us good accuracy or not.
Video: – Estimating price of houses (Data Analysis & Statics)
Data Analysis
Guiding Step for House Data Analysis
Step 1. Import dependent functions and libraries.
Step 2. Load Boston data from sklearn datasets and print boston house data keys.
Step 3. Add price column with data set
Now, add the “PRICE” index with our data for analysis of price with a different – 2 Index. also, here we find min, max & median values of the house price
Step 4. To analyse data, we find out basic information like null values and number of columns.
Output:
Note: we do not have any null values in data set.
Step 5. Replace Null Values
If you have null values, replace these null values with the median values of features.
Step 6. Find insight values.
Find out the description of the price column data set to see the most occurrence price & minimum price occurrence values.
Step 7. Correlation with price.
To use key columns for predicting the price of the house, find out the most co-related key column.
A. Correlation Matrix: It is a table showing correlation coefficients between sets of features. Values close to 1 and -1 indicate strong correlation.
Output:
B. Relationship b/w “PRICE” & “RM”(Room per dwelling).
Output:
C. Relationship b/w age of house and price.
Output:
D. Relation b/w price and per capita crime rate by town.
Output:
Note: According to the previous data analysis, find out room per dwelling. It shows the highest correlation with price. Next, to predict the price value of the house, use room per dwelling as a dependent data set for linear regression.
Linear Regression Model
Guiding step for predicting price of the house.
Step 1. Import dependent functions and libraries.
Step 2. Split data set into test and training data set.
According to the previous data visualization, “RM” feature shows a strong correlation with the price. So, we take test and train data sets of RM and Price features.
Note: The whole data set is divided into the training set and test set, in which the train in the sklearn library is used_ test_ Split function, which is used to divide the data set. Set the training data set to 80%.
Output:
So, here we divide two datasets into training and testing data.
Step 3. Reshape array with numpy array.
Establish a linear regression model and use the training set to train_ Train and Y_ Trains are all one-dimensional arrays, which need to be converted into two-dimensional matrices. Here, we use a reshape function in a numpy library to transform them into two-dimensional matrices.
Step 4. Define linear regression function to train our dataset to find out predicted values.
Step 5. Test model with test data set.
Use the test set data to test, get the prediction results, and draw a line chart between the predicted results and the real results, which can directly show the fitting effect.
Note: “pred” is the predicted value of price for “RM” test data sets.
Output:
Step 6. Plot regression line.
Also, draw the fitting function and test set data, and you can directly see the fit degree of the regression function and training set. The model uses the function fitted by the training set data to predict, so the fitting function can be obtained by inputting the training set data into the model.
Output:
Step 7. Find out the standard deviation of error.
Output:
Conclusion:
This data science and data analytics tutorial in real estate has provided you with an overview of how data science and analytics tools can be used to improve decision-making in the real estate industry. You have seen how data science and analytics can be used to identify trends, optimize marketing campaigns, and analyze customer behaviour. With the help of this tutorial, you now have the necessary knowledge to use data science and analytics to make better decisions in the real estate industry.
Also you may like – A Guide to Face Detection in Python
Leave A Comment