In the current topic, we will learn how to perform Machine Learning through Predictive Analysis using Multi Linear Regression in R with an example.
Introduction
In the previous article, we have seen how to use Machine Learning through Predictive Analysis using simple Linear Regression in R with an example.In the current topic, we will learn how to perform Machine Learning through Predictive Analysis using Multi Linear Regression in R with an example.
Multi Linear Regression can be defined as
Multiple linear regression attempts to model the relationship between two or more explanatory variables and a response variable by fitting a linear equation to observed data. Every value of the independent variable x is associated with a value of the dependent variable y.
We must bear in our mind that multiple linear regression has more than one independent variables. What it means will be clarified below.
We will use RStudio for this purpose.
Let's start with an example
Let us consider that we have the below set of data in "Car Sample Data.csv" file
Let's say Price is the response variable(Y). And CylinderVolume(x1),Year(x2),MileagePerKM(x3) are the predictor variables.This is a linear relationship among one response variable and multiple predictor variables of the form
y = a + b1x1 + b2x2 + b3x3 + ... + bnxn .
where,
Y - dependent/predictor variable
x1/x2/x3..xn - independent/response variable(s)
a,b1,b2,b3...bn - are co-efficients.
What we are going to solve ?
The above data presented to us is a set of training data / historical data. Using our training data, we have to train our Predictive Model by using Multiple Linear Regression algorithm. Once, our algorithm is trained i.e. the machine has learnt what to do, we will predict Y given a new value of predictor variables.
Straight to experiment
Open RStudio. First we will establish a Relationship Model between x1/x2/x3[Predictor Variables] and Y(Response Variable) and obtain the Coefficients values. For this we will use the lm function of R that creates a relationship model between the predictor and the response variable.
# Load data from csv
input <- read.csv('d:/Car Sample Data.csv', sep = ',', quote="\"", check.names=F)
# Create the relationship model.
model <- lm(Price~CylinderVolume+Year+MileagePerKM, data=input)
#print the relation
print(model)
Thus we obtain the mathematical equation for Multi Linear Regression Model based on the above intercept and coefficient values which is
y = -4.683e+07 + x1 * 2.534e+02 + x2 * 2.339e+04 + x3 * -2.854e-01
So far we have trained our Model using the training dataset. Means our machine has learnt the algorithm. The next step is to predict. Say, we would like to predict the price of a TOYOTA car whose CylinderVolume = 2000, Year=2020, MileagePerKM =90000.
It will be wrong if we directly use the mathematical equation obtained above to predict the result. First we have to filter the record based on the MAKE to obtain a right Mathematical model and then apply the predict function as shown below.
# Load all data from csv
input <- read.csv('d:/Car Sample Data.csv', sep = ',', quote="\"", check.names=F)
#Filter data based on Make
filterRecord <-input[input$Make == 'TOYOTA',]
#print the filtered result
print(filterRecord)
# Create the relationship model.
RelationModel <- lm(Price~CylinderVolume+Year+MileagePerKM, data=filterRecord)
#print the relation
print(RelationModel)
Thus we obtain the mathematical equation which is
y = -4.838e+07 + x1 * 3.630e+02 + x2 * 2.415e+04 + x3 * -1.432e+00
To obtain the predicted value let us run the below program
# Load all data from csv
input <- read.csv('d:/Car Sample Data.csv', sep = ',', quote="\"", check.names=F)
#Filter data based on Make
filterRecord <-input[input$Make == 'TOYOTA',]
# Create the relationship model.
RelationModel <- lm(Price~CylinderVolume+Year+MileagePerKM, data=filterRecord)
# predict the price of a TOYOTA car whose CylinderVolume = 2000, Year=2020, MileagePerKM =90000
PriceY <- data.frame(CylinderVolume=2000,Year=2020,MileagePerKM=90000)
#display the value
print( predict(RelationModel,PriceY) )
And we found the answer which is 9,89,693.7
Reference
- Machine Learning
- Predictive Analytics
Conclusion
In this article we have learnt Machine Learning through Predictive Analysis using Multi Linear Regression methodology by using the language R with a simple example.Hope this helps. Thanks for reading.