365
26th Jun 2017
Linear regression
Linear regression is an advanced method that is widely used by most of the business companies for financial statements forecasting. Linear regression is a statistical method or approach that is widely used for forecasting the financial statements. Here we rely on the average relationship between a dependent and the independent variable. There are different forms of regression models such as simple regression model which take into account one independent variable such as the sales pricing or the advertising expenses and the multiple regression which considers two or more variables such as sales pricing and advertising expenses together. It is important to note that the regression analysis is very popular for forecasting the sales because it helps the manager or the business organization in finding out the right fit over arrange of observations. For instance the following observation can be plotted to prepare a scatter graph and thus find the right fit.
Advertising Expense | Sales Dollars |
$ 100 | $ 1,500 |
150 | 1,560 |
180 | 1,610 |
220 | 1,655 |
270 | 1,685 |
The graph above shows that linear regression will be the most suitable way for finding the relationship between the sales and the advertising expenses.
Military research papers is a serious kind of work that has to be done in accordance with the strict rules. If you experience problems with writing this paper, don’t hesitate to ask Pro-Papers for help.
Anon-linear relationship
Here the graph indicates that a linear model would not be suitable in describing the relationship between sales and advertising. This is evidenced by the shape of the graph. The shape of the graph is non- linear.
The main purpose of this model is to help the business enterprise or the manager to understand a particular situation in the business and possibly to explain the reason for that situation and then finally analyze the situation. It is important to note that while constructing the linear regression models; it is prudent that an individual take into account certain assumption in order to come up with a manageable model.
Analysis of linear regression model
This part of the paper will look at the various crucial steps which ought to be taken into consideration in the analysis of a simple linear regression. One sample of data will be taken for the purposes of analysis.
Simple linear regression
Here we are interested in whether there exists any relationship between two variables. For instance, we may be interested in the relationship between the price and quantity of a product sold, employees age and salary, weekly departmental costs and hour , chicken’s age and weight and finally the distance travelled and the time that have been taken.
For instance, let us take into consideration this, a poultry farmer wishes to predict the weight of the chickens he/she is rearing. In this context, weight is the variable that we wish to predict. This means that weight will be the dependent variable. In this case, we are going to plot the dependent variable which is weight on the Y axis in order to show that the weight of the chicken depends on the chicken’s age and therefore age is considered to be independent variable and thus it will be plotted on the X-axis. By establishing the relationship between the weight and age of the chicken statistically, we are able to predict the weight of the chicken by simply looking at the age of the chicken
Let us assume that we are running a special delivery service in a city and we wish to find out the cost for the service using the linear regression. To predict the cost effectively, one will be required to estimate the time for deliveries of any given distance. It is therefore important to take into consideration the following factors because they too will affect the time for delivery: we have to consider the traffic congestion, weather, the state of the road and the driver.
We will then be required to measure the time and distance for every tenth journey starting from a randomly selected day and a randomly selected hour of the next week. Let’s assume that the firm work for six days a week and the random number selected is 2. This will mean that the chosen day is next Tuesday. If the service runs from 8am to 6pm this will imply that random number from 0-9 is chosen from the random number tables in order to select the starting time. The random number chosen is six so the first journey chosen will be the first one after 1pm then we shall take tenth delivery after that.
Let us now assume that the sample data for the first ten deliveries as follows.
Table 1. Sample data for delivery distances and time
Distance in miles | Time in minutes |
2.5 | 16 |
3.4 | 13 |
1.9 | 19 |
t.2 | 18 |
3.0 | 12 |
α.3 | 11 |
3.0 | 8 |
3.0 | 14 |
1.5 | 9 |
4.1 | 16 |
The variation between the time taken and distance will be explained here in order to show how regression analysis works. Time taken will be the dependent variable (y) while distance will be the independent variable (x). From the data above we can use the linear regression analysis to find the line of best fit which will in turn help us to find the relationship between the variables, which is time and distance taken. The following calculations may be applied in the calculation as shown
Slope, b= nΣxy – ΣxΣy
nΣx3 (Σx)2
Where n is the sample size.
Intercept, α= Σy2 – bΣx
n
So when calculating the sample of size n=10. The linear regression model given below is used.
Y= x+ab. The calculation table will therefore be as follows.
Table 2 showing the calculation of the regression line
X miles | Y miles | xy | x2 | y2 |
3.5 | 16 | 56.0 | 12.25 | 256 |
2.4 | 13 | 31.2 | 5.76 | 169 |
4.9 | 19 | 93.1 | 24.01 | 361 |
4.2 | 18 | 75.6 | 17.64 | 324 |
3.0 | 12 | 36.0 | 9.0 | 144 |
1.3 | 11 | 14.3 | 1.69 | 121 |
1.0 | 8 | 8.0 | 1.0 | 64 |
3.0 | 14 | 42.0 | 9.0 | 196 |
1.5 | 9 | 13.5 | 2.25 | 81 |
4.1 | 16 | 65.6 | 16.81 | 256 |
Totals 28.9 | 136 | 435.3 | 99.41 | 1972 |
The slope, b= 10 × 435.3 − 28.9 × 136
10 × 99.41 − 28.92
= 422.6
158.9
= 2.66
The values obtained is then inserted in the linear regression model thereby giving
y =5.91 +2.66x
Hence the delivery time will be as follows,
Delivery time (Min) = 5.91+2.66×delivery distance (miles). Therefore the slope of the regression line which is 2.66 minutes per mile is the estimated number of minutes per mile which is required for delivery while the intercept 5.91 minutes is the estimated time to prepare for the journey and deliver the goods. However, it is important that we determine whether the forecasted delivery time is reliable by calculating the confidence level for the given data.
For the use of simple linear regression analysis to be efficient and effective, it is prudent that we test the strength of the linear regression. The stronger the relationship the closer this ratio will be to one. The ratio is called the coefficient determination and is represented by the symbol r^{2 }where
r^{2} = ∑(y-y)^{2}
∑(y-y
R^{2} is always expressed as a percentage in order to show the amount of the variation in y brought about as a result of introducing x into the model. Application of the person product moment correlation coefficient is crucial here. The coefficient is the square root of the coefficient of determination and is given by the following formulae.
The equation above can be rearranged algebraically as follows so as to make calculations simple.
The value of r always lies between -1 and +1. It is also important to note that the signor r is the same as the one of the slope b. this implies that if b is positive than r is also positive and vice versa. As the strength of the relationship increases the plotted points will lie more closely along a straight line and the magnitude of r will be closer to 1 while as the strength reduces the value or r comes closer to zero. Where r = 0. This shows that there is no linear relationship between the variables. Consider the following diagrams.
By using the following equation, we can compute the value of r for which the model was set up to predict delivery times for the journeys of a given distance within a city. Therefore
From the answer above, we can say that it is a very strong linear relationship between the delivery distance and the time taken because the value of r is very close to 1. The coefficient of determination therefore can be calculated as follows.
R^{2}=0.958×100=91.8%
Prediction within the range of sample data
We can use the model for prediction purposes. For instance let us assume that the distance covered was 4miles, this means that the estimated mean journey time would be.
Y=5.91+2.66×4.0=16.6 minutes.
It is important to note that prediction by the use of this model is likely to be unreliable, it is therefore crucial that we carry out reliability to test in order to ascertain the reliability of our predictions.
Estimation of errors and residuals
The main reason for calculating the errors is to assess the reliability of the predictions by mean of the differences between the observed value of the dependent variable y and the predicted value yˆ for each value of the independent variable x .
Importance of errors
Hypothesis to test the overall linearity of the relationship (correlation coefficient)
The test is done in order to determine the weather linear relationship exists or not. the correlation coefficient is always assessed using T .test. for instance
H_{0}; there is no linear relationship between the y and x variables. The independent variable does not help in predicting the values of y that is r=0.
H_{1}: r≠0 means there is no linear relationship between the variables x and y and therefore x does not help to predict the y values. And therefore by using a one sided test. The test statistics will be as follows.
To get the number of degrees of freedom you (n-2). This is because we had calculated x and y to find out the degrees of freedom. And therefore, if we wish to test linearity at 5% confidence level using a two-tail test statistic, it would be compared with t_{0.025,(n-2)} found from the tables. If we consider the estimation of the journey times from the journey distances the test statistic would be as follows.
Still struggling with the analytical essays? Let Pro-Papers give a helping hand to you! Our professional writers will write any kind of essay for you quickly and at affordable price!
The numbers of degrees of freedom is (10-2) =8
From the table t_{0.025,8 =2.306 } we conclude by saying that since test statistic (9.46) is greater than 2.306 therefore the null hypothesis H_{0} at 5% confidence level is rejected while we accept H_{1} . We therefore assume that the correlation coefficient is not zero hence there is a linear relationship between journey, time and distance.
Hypothesis test on the slope of the simple regression line.
Here we can apply the following formula
t= (sample statistics parameter assumed in H_{0}
best estimate of the standard error
Therefore, the test statistics for the linear regression coefficient, b, is:
t= (b-0)
Estimated standard error of b
The estimated standard of b is
All these calculation are done to test the validity and the reliability of the linear regression used in order to ensure that the prediction is accurate.
The same way the regression method have been used here to predict the mean journey time for any given distance, is the same way the method is applied by the business organizations to carry out financial forecasting in the business organization. The method can be used to carry out forecasting in different areas within the business organization. It can be used to forecast production, sales, cash flow e.t.c.
Leave a Reply
Your email address will not be published / Required fields are marked *