Machine Learning through Predictive Analysis using Simple Linear Regression in R with an example

Rajnilari2015
Posted by in R Language category on for Beginner level | Points: 250 | Views : 1102 red flag

In this article we will learn in a step by step method Machine Learning through Predictive Analysis using Linear Regression methodology by using the language R with an example.

Introduction

We will start our topic with Tom M. Mitchell definition of Machine Learning

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E

Now, machine learning is classified broadly as Supervised Learning and UnSupervised Learning. In case of Supervised Learning method, the output depends on the data set provided.That means a direct relation exits between the input and the output.In another word, we predict the result of a future element, based on the analysis of the past dataset(s).

This Supervised Learning method is further categorized as Regression and Classification problems.

In the case of Regression Model we predict results for continuous output while for Classification Model we predict results for discrete output.

Linear Regression can be defined as

In statistics, linear regression is a linear approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variables) denoted X.

We must bear in our mind that simple linear regression has only 1 independent variable. What it means will be clarified below.

In this article we will learn in a step by step method Machine Learning through Predictive Analysis using Linear Regression methodology by using the language R with an example.

We will use RStudio for this purpose.

Let's start with an example

We will start with a simple example. Say we have been presented with the below set of data

Salary data of Employees
X - (Years of Exp.) Y - (Salary in INR)
3 30
8 57
9 64
12 72
3 36
6 43
11 59
21 90
1 12
16 83

In the dataset presented we can figure out that X - denotes Number of years an Employee words while Y - denotes his/ her salary.This is a linear relationship between two variables X and Y of the form

 Y = a + X * b.

where,

	Y - dependent/predictor variable
	X - independent/response variable 
	a - intercept
	b - slope of the line / tangent
	a and b are rather constants which are called as co-efficients.

What we are going to solve ?

The above data presented to us is a set of training data / historical data. Using our training data, we have to train our Predictive Model by using Simple Linear Regression algorithm. Once, our algorithm is trained i.e. the machine has learnt what to do, we will predict Y given a new value of X.

Straight to experiment

Open RStudio. First we will establish a Relationship Model between X(Predictor Variable) and Y(Response Variable) and obtain the Coefficients (a and b). For this we will use the lm function of R that creates a relationship model between the predictor and the response variable.

#Training data for predictor variable
x <- c(3,8,9,12,3,6,11,21,1,16)

#Training data for response variable
y <- c(30,57,64,72,36,43,59,90,12,83)

# Using lm() function to establish the Relationship between predictor and response variable.
relation <- lm(y~x)

#print the 
print(relation)

Thus we obtain the mathematical equation for Simple Linear Regression Model based on the above intercept and coefficient values which is

Y = 20.927 + X * 3.741 [ Y = a + X * b ] 

Let us make a Scatter Plot of our data set as under

#Training data for predictor variable
x <- c(3,8,9,12,3,6,11,21,1,16)

#Training data for response variable
y <- c(30,57,64,72,36,43,59,90,12,83)

# Using lm() function to establish the Relationship between predictor and response variable.
relation <- lm(y~x)

#Training data for predictor variable
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)

#Training data for response variable
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)

# Using lm() function to establish the Relationship between predictor and response variable.
relation <- lm(y~x)

# Give the chart file a name.
png(file = "D:\\SalaryDataSimpleLinearRegression.png")

# Plot the chart.
plot(y,x,col="blue",main="Salary and Year Of Experience Of Employee Records",
abline(lm(x~y)),cex = 1.8,pch=20,xlab="Years of Exp",ylab="(Salary in INR)")

# Save the file.
dev.off()

That gives the below output

It is revealed from the diagram that though the data points does not exactly fall on a straight line however the pattern suggest that there is indeed a linear relationship which exists between X(Years of Exp.) and Y (Salary in INR).

So far we have trained our Model using the training dataset. Means our machine has learnt the algorithm. The next step is to predict. Say, we would like to predict the salary of an employee having 17 years of experience. In this case we have to use predict function as shown below.

#Training data for predictor variable
x <- c(3,8,9,12,3,6,11,21,1,16)

#Training data for response variable
y <- c(30,57,64,72,36,43,59,90,12,83)

# Using lm() function to establish the Relationship between predictor and response variable.
relation < lm(y~x)

# Predict the salary of an employee(SalarY) having 17 years of experience(x)
SalarY <- data.frame(x=17)

#display the value
print( predict(relation,SalarY) )

So, we find that the salary of an employee having 17 years of experience is around 84.5K. And that's machine learning.

We can cross verify our result by putting the value of X = 17 in the earlier mathematical equation for Simple Linear Regression Model

Y = 20.927 + X * 3.741

When X = 17,then

Y = 20.927 + 17 * 3.741 => 84.524

Reference

  1. Machine Learning
  2. Predictive Analytics

Conclusion

In this article we have learnt Machine Learning through Predictive Analysis using simple Linear Regression methodology by using the language R with a simple example. The article, at the bare minimum, taught us

  1. What is Machine Learning
  2. What is Predective Analysis
  3. How to do Machine Learning through Predictive Analysis
  4. How to perform machine learning through simple Linear Regression - a Supervised Modeling technique.
  5. How to use R language for performing Machine Learning through Predictive Analysis via RStudio.
  6. Data visualization(Scatter Plot) using R language via RStudio.
  7. Some R functions.
  8. etc. etc.

Hope this helps. Thanks for reading.

Page copy protected against web site content infringement by Copyscape

About the Author

Rajnilari2015
Full Name: Niladri Biswas (RNA Team)
Member Level: Platinum
Member Status: Member,Microsoft_MVP,MVP
Member Since: 3/17/2015 2:41:06 AM
Country: India
-- Thanks & Regards, RNA Team


Login to vote for this post.

Comments or Responses

Posted by: Eddie007 on: 10/8/2018 | Points: 25
What is sync, many users no idea, so just nowhere and one a single click here this visit and read more about for this http://syncsettingswindows10.com first I can you say this is new features in window 10, Microsoft launched this service

Login to post response

Comment using Facebook(Author doesn't get notification)