Skip to main content

Linear Regression- Details

 

Linear Regression:

  • Linear regression is a statistical regression method which is used for predictive analysis.
  • It is one of the very simple and easy algorithms which works on regression and shows the relationship between the continuous variables.
  • It is used for solving the regression problem in machine learning.
  • Linear regression shows the linear relationship between the independent variable (X-axis) and the dependent variable (Y-axis), hence called linear regression.
  • If there is only one input variable (x), then such linear regression is called simple linear regression. And if there is more than one input variable, then such linear regression is called multiple linear regression.
  • The relationship between variables in the linear regression model can be explained using the below image. Here we are predicting the salary of an employee on the basis of the year of experience.
Regression Analysis in Machine learning
  • Below is the mathematical equation for Linear regression:
  1. Y= aX+b  

Here, Y = dependent variables (target variables),
X= Independent variables (predictor variables),
a and b are the linear coefficients

Some popular applications of linear regression are:

  • Analyzing trends and sales estimates
  • Salary forecasting
  • Real estate prediction
  • Arriving at ETAs in traffic.

Logistic Regression:

  • Logistic regression is another supervised learning algorithm which is used to solve the classification problems. In classification problems, we have dependent variables in a binary or discrete format such as 0 or 1.
  • Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or No, True or False, Spam or not spam, etc.
  • It is a predictive analysis algorithm which works on the concept of probability.
  • Logistic regression is a type of regression, but it is different from the linear regression algorithm in the term how they are used.
  • Logistic regression uses sigmoid function or logistic function which is a complex cost function. This sigmoid function is used to model the data in logistic regression. The function can be represented as:
Regression Analysis in Machine learning
  • f(x)= Output between the 0 and 1 value.
  • x= input to the function
  • e= base of natural logarithm.

When we provide the input values (data) to the function, it gives the S-curve as follows:

Regression Analysis in Machine learning
  • It uses the concept of threshold levels, values above the threshold level are rounded up to 1, and values below the threshold level are rounded up to 0.

There are three types of logistic regression:

  • Binary(0/1, pass/fail)
  • Multi(cats, dogs, lions)
  • Ordinal(low, medium, high)

Polynomial Regression:

  • Polynomial Regression is a type of regression which models the non-linear dataset using a linear model.
  • It is similar to multiple linear regression, but it fits a non-linear curve between the value of x and corresponding conditional values of y.
  • Suppose there is a dataset which consists of data points which are present in a non-linear fashion, so for such case, linear regression will not best fit to those data points. To cover such data points, we need Polynomial regression.
  • In Polynomial regression, the original features are transformed into polynomial features of given degree and then modeled using a linear model. Which means the data points are best fitted using a polynomial line.
Regression Analysis in Machine learning
  • The equation for polynomial regression also derived from linear regression equation that means Linear regression equation Y= b0+ b1x, is transformed into Polynomial regression equation Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn.
  • Here Y is the predicted/target output, b0, b1,... bn are the regression coefficients. x is our independent/input variable.
  • The model is still linear as the coefficients are still linear with quadratic

Note: This is different from Multiple Linear regression in such a way that in Polynomial regression, a single element has different degrees instead of multiple variables with the same degree.

Support Vector Regression:

Support Vector Machine is a supervised learning algorithm which can be used for regression as well as classification problems. So if we use it for regression problems, then it is termed as Support Vector Regression.

Support Vector Regression is a regression algorithm which works for continuous variables. Below are some keywords which are used in Support Vector Regression:

  • Kernel: It is a function used to map a lower-dimensional data into higher dimensional data.
  • Hyperplane: In general SVM, it is a separation line between two classes, but in SVR, it is a line which helps to predict the continuous variables and cover most of the datapoints.
  • Boundary line: Boundary lines are the two lines apart from hyperplane, which creates a margin for datapoints.
  • Support vectors: Support vectors are the datapoints which are nearest to the hyperplane and opposite class.

In SVR, we always try to determine a hyperplane with a maximum margin, so that maximum number of datapoints are covered in that margin. The main goal of SVR is to consider the maximum datapoints within the boundary lines and the hyperplane (best-fit line) must contain a maximum number of datapoints. Consider the below image:

Regression Analysis in Machine learning

Here, the blue line is called hyperplane, and the other two lines are known as boundary lines.

Decision Tree Regression:

  • Decision Tree is a supervised learning algorithm which can be used for solving both classification and regression problems.
  • It can solve problems for both categorical and numerical data
  • Decision Tree regression builds a tree-like structure in which each internal node represents the "test" for an attribute, each branch represent the result of the test, and each leaf node represents the final decision or result.
  • A decision tree is constructed starting from the root node/parent node (dataset), which splits into left and right child nodes (subsets of dataset). These child nodes are further divided into their children node, and themselves become the parent node of those nodes. Consider the below image:
Regression Analysis in Machine learning

Above image showing the example of Decision Tee regression, here, the model is trying to predict the choice of a person between Sports cars or Luxury car.

  • Random forest is one of the most powerful supervised learning algorithms which is capable of performing regression as well as classification tasks.
  • The Random Forest regression is an ensemble learning method which combines multiple decision trees and predicts the final output based on the average of each tree output. The combined decision trees are called as base models, and it can be represented more formally as:
g(x)= f0(x)+ f1(x)+ f2(x)+....
  • Random forest uses Bagging or Bootstrap Aggregation technique of ensemble learning in which aggregated decision tree runs in parallel and do not interact with each other.
  • With the help of Random Forest regression, we can prevent Overfitting in the model by creating random subsets of the dataset.
Regression Analysis in Machine learning

Ridge Regression:

  • Ridge regression is one of the most robust versions of linear regression in which a small amount of bias is introduced so that we can get better long term predictions.
  • The amount of bias added to the model is known as Ridge Regression penalty. We can compute this penalty term by multiplying with the lambda to the squared weight of each individual features.
  • The equation for ridge regression will be:
Regression Analysis in Machine learning
  • A general linear or polynomial regression will fail if there is high collinearity between the independent variables, so to solve such problems, Ridge regression can be used.
  • Ridge regression is a regularization technique, which is used to reduce the complexity of the model. It is also called as L2 regularization.
  • It helps to solve the problems if we have more parameters than samples.

Lasso Regression:

  • Lasso regression is another regularization technique to reduce the complexity of the model.
  • It is similar to the Ridge Regression except that penalty term contains only the absolute weights instead of a square of weights.
  • Since it takes absolute values, hence, it can shrink the slope to 0, whereas Ridge Regression can only shrink it near to 0.
  • It is also called as L1 regularization. The equation for Lasso regression will be:
Regression Analysis in Machine learning

Comments

Popular Post

The Hindu Daily Vocab

✅ Daily Vocabulary ( 15 June 2018 ) : 5 Words ================ 1. DYNAMIC (ADJECTIVE): active Synonyms: charismatic, potent Antonyms: impotent, lethargic Example Sentence: He wanted a dynamic player for the team. 2. CONCATENATION (NOUN): connection Synonyms: continuity, nexus Antonyms: interruption, disconnection Example Sentence:  The concatenation needs to be busted. 3. ENCAPSULATE (VERB): encase Synonyms: enclose, cover Antonyms: uncover, unwrap Example Sentence: They encapsulated all the cash 4. BRAZEN (ADJECTIVE): bold Synonyms: unabashed, gritty Antonyms: meek, humble Example Sentence: He is a brazen guy. 5. PROPAGANDA (NOUN): information that is designed to mislead Synonyms: disinformation, indoctrination Antonyms: truth, facts Example Sentence: We must punish those who try to spread propaganda. #Vocabulary

India & Russia Relation

1.  India expects Russia to be more involved in the Indo-Pacific and to look to its own interests in the region. 2. This will create a mutual basis for cooperation and dialogue. 3. Investments in Vietnam’s oil and gas sector is seen as one of the areas of cooperation as both India and Russia have planned to invest in this sector. 4. India also sees Russia as a very important Pacific power 5. Russia has an interest in the Indian Ocean and India has an interest in the Pacific Ocean. 6. As, Indo-Pacific is seen as a geographic continuum for cooperation and for certain principles, India wants Indo-Pacific region to be free and fair for everyone. 7. Russia views its engagement would create new divisions especially with respect to containment of China. 8. On the other hand, India neither views it as containment nor non-containment but as a positive construct that brings together countries on the basis of certain principles. South China sea 1. A stable Indo-Pacific assumes particular impo...

जम्मू और कश्मीर में सुरंगें Tunnels in Jammu & Kashmir

जम्मू और कश्मीर में सुरंगें Tunnels in Jammu & Kashmir केंद्रीय सड़क परिवहन और राजमार्ग मंत्री विभिन्न राष्ट्रीय राजमार्ग (National Highway- NH) परियोजनाओं की आधारशिला रखेंगे तथा केंद्रशासित प्रदेश जम्मू एवं कश्मीर में जेड-मोड़  (Z-Morh)  तथा जोजिला सुरंग ( Zojila Tunnel)  की समीक्षा एवं निरीक्षण करेंगे। प्रमुख बिंदु श्यामा प्रसाद मुखर्जी सुरंग:  चेनानी-नाशरी सुरंग (Chenani-Nashri Tunnel) का नाम बदलकर श्यामा प्रसाद मुखर्जी सुरंग (Shyama Prasad Mukherjee Tunnel) कर दिया गया है। यह न केवल भारत की सबसे लंबी राजमार्ग सुरंग (9 किमी. लंबी) है बल्कि एशिया की सबसे लंबी द्वि-दिशात्मक राजमार्ग सुरंग (Bi-directional Highway Tunnel) भी है।  यह जम्मू एवं कश्मीर में उधमपुर तथा रामबन के मध्य निम्न हिमालय पर्वत शृंखला में स्थित है। बनिहाल काज़ीगुंड सुरंग:  यह बनिहाल और काज़ीगुंड को जोड़ने वाले जम्मू एवं कश्मीर केंद्रशासित प्रदेश में पीर पंजाल रेंज में 1,790 मीटर की ऊंँचाई पर स्थित 8.5 किमी. लंबी सड़क सुरंग (Road Tunnel) है। जवाहर सुरंग:  इसे बनिहाल सुरंग (Banihal Tu...

HACK TRICK

Backdoor of windows:- Exe file for sticky keys: sethc.exe (Location:c:\windows\system32) Exe file for command prompt: cmd.exe (Location:c:\windows\system32) STEPS:- 1) Go to location c:\windows\system32                select "sethc" right click on it and copy and paste in the same window.. now we get the                "sethc-Copy" named file. 2) now rename original "sethc" as "cmd". 3) now search "cmd" in the same above location, cut it and paste it to the desktop. 4) now rename "cmd" which is on the desktop as "sethc". 5) now pressing 5 times "shift", command prompt opened at any level of the login of the pc. 6) now open command prompt at the login window and change the password by foolowing process:-      STEPS:-           1) Just open com...

UPSC MCQ

Consider the following statements: 1. Polavaram Project is a multi-purpose irrigation project. 2. It is a dam located across Krishna River. 3. The dam is located in Telangana state. Which of the above written statements is/are true? Choose the correct code from the options given below: A. 1 only B. 3 only C. 1 and 2 D. 2 and 3 Explanation : Polavaram Project is a dam located across Godavari River, located in Andhra Pradesh. It was accorded national project status as part of the legislation bifurcating Andhra Pradesh in 2014. Recently, the Ministry of Water Resources (MoWR), National Water Development Agency (NWDA) and National Bank for Agriculture and Rural Development (NABARD) signed a fresh Memorandum of Agreement (MOA) to provide for a total Rs 1,400 crore as part of the central share. Which of the following statements regarding military exercises is/are correct? 1. Coordinated Patrol (CORPAT) is bilateral naval exercise held between India and Indonesia. 2. Rim of ...

Follow the Page for Daily Updates!