Scatterplot showing a regression
line created from known weather station readings.
|
The graphic above depicts data from two different weather stations. This data was used to create a regression line, or the best 'fit line between two known values. This best fit line is then used to predict values, such as predicted rainfall totals.
For the purposes of our lab we used the regression line to obtain possible values for Station A, which was missing data for an 18 year period. By determining the slope and intercept values (based on the known input from our two weather stations) we were able to predict what the rainfall totals for Station A could have been based on the data from Station B for the same year. The formula used was: Y' = bX + a. Or written another way: the predicted value for Station A = (slope * Station B input) + intercept.
This type of analysis is very useful if you wish to compare the differences between two (or more) variables, or to make value predictions based on known information. However there are some caveats: first, the data must be linear and normalized - wacky outliers can skew these results. Also, not all data types can be used to run a bivariate regression analysis - for example, if the data to be compared consists of percentages or arbitrary values (such as names) the data must either be transformed or an alternative analysis method must be used. Lastly, just because two variables can be compared doesn't mean they should be - there may not be a statistical or logical relationship between the two variables. Essentially, one needs to know their datasets - and run additional tests (such as a t-test) to determine statistical validity.
*Originally published on November 5, 2015. Updated on 2/27/2017 to repair image links.
No comments:
Post a Comment