Hello guys, we have learnt about Linear Regression model in my previous article. Today, in this article we will get to learn the basics of Logistic Regression and some tricks to find the relation between the variables.
Do you know what type of variable is used in logistic regression… Don’t worry, if you don’t know then let me teach the variables:
In simple linear regression the variables are one dependent and one independent, In multiple linear regression there are more than one independent variable.
Understand one thing if your data is in continuous form then use only linear regression model, while on the other hand , if your data is in categorical form(e.g. positive and negative) and in binary form(0,1) then use only logistic regression. In this model the data been code in binary form. like 1 for positive and 0 for negative [just assumption].
In statistics, logistic regression, or logit regression, or logit model is a regression model where the dependent variable (DV) is categorical.
Logistic regression measures the relationship between the categorical dependent variable and one or more independent variables by estimating probabilities using a logistic function, which is the cumulative logistic distribution. Thus, it treats the same set of problems as probit regression using similar techniques, with the latter using a cumulative normal distribution curve instead.
Logistic regression can be seen as a special case of the generalized linear model and thus similar to linear regression. The model of logistic regression, however, is based on quite different assumptions (about the relationship between dependent and independent variables) from those of linear regression. In particular the key differences of these two models can be seen in the following two features of logistic regression.
- First, the conditional distribution y|x is a Bernoulli distribution rather than a Gaussian distribution, because the dependent variable is binary.
- Second, the predicted values are probabilities and are therefore restricted to (0,1) through the logistic distribution function because logistic regression predicts the probability of particular outcomes.
Logistic regression is widely used in many fields such as medical, social media.
For example, in medical field suppose a patient has a disease(like HIV) based on the observed characteristics of patient(Age, Sex, various Blood Tests and Urine Tests).
Another example like if you want to predict the election result for some National party, or want to predict that whether voter will vote for congress or democratic party, based on the age, sex, income, caste and many more characteristics.
A group of 20 students spend between 0 and 6 hours studying for an exam. How does the number of hours spent studying affect the probability that the student will pass the exam?
The table shows the number of hours each student spent studying, and whether they passed (1) or failed (0).
Logistic Equation: β+ β1*Hours = logit(p) →1
p=probabilty of presence of the characteristics of interest
How we will find the values of β and β1, I will tell you
We will calculate this like we have calculated in linear regression model
logit transformation is defined as the logged odds
odds = [1/1-p] and
By using the logistic equation we will get the values of β, β1…..
First of all resolve the eq 1 and 2
putting the value of logit(p) in eq2
f = β+ β1*Hours
onresolving this we will get the below equation:
p = [1/1+exp[-f]
Now put the values of hours and find the condition for a student, it will show the probability of passing the exam.
Now you will get the probabilities of Passing the exam if study for that much(in hours)
Note: In next article I will teach you how to make a logistics regression model using R.
Till then learn the basics of logistics regression if you have doubts please write it in comment box, its free 😛
Example source You will get more on Logistic Regression Model
Graphical Representation for both Linear and Logistic Regression will be posted very soon till then stay tuned Thank you very much.