# Understanding Simple Linear Regression In nursing practice.

QUESTION

Exercise 14

Understanding Simple Linear Regression

Statistical Technique in Review

In nursing practice, the ability to predict future events or outcomes is crucial, and researchers calculate and report linear regression results as a basis for making these predictions. Linear regression provides a means to estimate or predict the value of a dependent variable based on the value of one or more independent variables. The regression equation is a mathematical expression of a causal proposition emerging from a theoretical framework. The linkage between the theoretical statement and the equation is made prior to data collection and analysis. Linear regression is a statistical method of estimating the expected value of one variable, y, given the value of another variable, x. The focus of this exercise is simple linear regression, which involves the use of one independent variable, x, to predict one dependent variable, y.

The regression line developed from simple linear regression is usually plotted on a graph, with the horizontal axis representing x (the independent or predictor variable) and the vertical axis representing the y (the dependent or predicted variable; see Figure 14-1). The value represented by the letter a is referred to as the y intercept, or the point where the regression line crosses or intercepts the y-axis. At this point on the regression line, x = 0. The value represented by the letter b is referred to as the slope, or the coefficient of x. The slope determines the direction and angle of the regression line within the graph. The slope expresses the extent to which y changes for every one-unit change in x. The score on variable y (dependent variable) is predicted from the subject’s known score on variable x (independent variable). The predicted score or estimate is referred to as Ŷ (expressed as y-hat) (Cohen, 1988; Grove, Burns, & Gray, 2013; Zar, 2010).

FIGURE 14-1  GRAPH OF A SIMPLE LINEAR REGRESSION LINE

140

Simple linear regression is an effort to explain the dynamics within a scatterplot (see Exercise 11) by drawing a straight line through the plotted scores. No single regression line can be used to predict, with complete accuracy, every y value from every x value. However, the purpose of the regression equation is to develop the line to allow the highest degree of prediction possible, the line of best fit. The procedure for developing the line of best fit is the method of least squares. If the data were perfectly correlated, all data points would fall along the straight line or line of best fit. However, not all data points fall on the line of best fit in studies, but the line of best fit provides the best equation for the values of y to be predicted by locating the intersection of points on the line for any given value of x.

The algebraic equation for the regression line of best fit is y = bx + a, where:

y=dependentvariable(outcome)

x=independentvariable(predictor)

b=slopeoftheline(beta,orwhattheincreaseinvalueisalongthex-axisforeveryunitofincreaseintheyvalue),alsocalledtheregressioncoefficient.

a=y−intercept(thepointwheretheregressionlineintersectsthe y-axis),alsocalledtheregressionconstant(Zar,2010).

In Figure 14-2, the x-axis represents Gestational Age in weeks and the y-axis represents Birth Weight in grams. As gestational age increases from 20 weeks to 34 weeks, birth weight also increases. In other words, the slope of the line is positive. This line of best fit can be used to predict the birth weight (dependent variable) for an infant based on his or her gestational age in weeks (independent variable). Figure 14-2 is an example of a line of best fit that was not developed from research data. In addition, the x-axis was started at 22 weeks rather than 0, which is the usual start in a regression figure. Using the formula y = bx + a, the birth weight of a baby born at 28 weeks of gestation is calculated below.

Formula:y=bx+a

Inthisexample,a=500,b=20,andx=28weeks

y=20(28)+500=560+500=1,060grams

FIGURE 14-2  EXAMPLE LINE OF BEST FIT FOR GESTATIONAL AGE AND BIRTH WEIGHT

141

The regression line represents y for any given value of x. As you can see, some data points fall above the line, and some fall below the line. If we substitute any x value in the regression equation and solve for y, we will obtain a ŷ that will be somewhat different from the actual values. The distance between the ŷ and the actual value of y is called residual, and this represents the degree of error in the regression line. The regression line or the line of best fit for the data points is the unique line that will minimize error and yield the smallest residual (Zar, 2010). The step-by-step process for calculating simple linear regression in a study is presented in Exercise 29.

Research Article

Source

Flannigan, C., Bourke, T. W., Sproule, A., Stevenson, M., & Terris, M. (2014). Are APLS formulae for estimating weight appropriate for use in children admitted to PICU? Resuscitation, 85(7), 927–931.

Introduction

Medications and other therapies often necessitate knowing a patient’s weight. However, a child may be admitted to a pediatric intensive care unit (PICU) without a known weight, and instability and on-going resuscitation may prevent obtaining this needed weight. Clinicians would benefit from a tool that could accurately estimate a patient’s weight when such information is unavailable. Thus Flannigan et al. (2014) conducted a retrospective observational study for the purpose of determining “if the revised APLS UK [Advanced Paediatric Life Support United Kingdom] formulae for estimating weight are appropriate for use in the paediatric care population in the United Kingdom” (Flannigan et al., 2014, p. 927). The sample included 10,081 children (5,622 males and 4,459 females), who ranged from term-corrected age to 15 years of age, admitted to the PICU during a 5-year period. Because this was a retrospective study, no geographic location, race, and ethnicity data were collected for the sample. A paired samples t-test was used to compare mean sample weights with the APLS UK formula weight. The “APLS UK formula ‘weight = (0.05 × age in months) + 4’ significantly overestimates the mean weight of children under 1 year admitted to PICU by between 10% [and] 25.4%” (Flannigan et al., 2014, p. 928). Therefore, the researchers concluded that the APLS UK formulas were not appropriate for estimating the weight of children admitted to the PICU.

Relevant Study Results

“Simple linear regression was used to produce novel formulae for the prediction of the mean weight specifically for the PICU population” (Flannigan et al., 2014, p. 927). The three novel formulas are presented in Figures 1, 2, and 3, respectively. The new formulas calculations are more complex than the APLS UK formulas. “Although a good estimate of mean weight can be obtained by our newly derived formula, reliance on mean weight alone will still result in significant error as the weights of children admitted to PICU in each age and sex [gender] group have a large standard deviation . . . Therefore as soon as possible after admission a weight should be obtained, e.g., using a weight bed” (Flannigan et al., 2014, p. 929).

FIGURE 1  Comparison of actual weight with weight calculated using APLS formula “Weight in kg = (0.5 × age in months) + 4” and novel formula “Weight in kg = (0.502 × age in months) + 3.161” Flannigan, C., Bourke, T. W., Sproule, A., Stevenson, M., & Terris, M. (2014). Are APLS formulae for estimating weight appropriate for use in children admitted to PICU? Resuscitation, 85(7), p. 928.

FIGURE 2  Comparison of actual weight with weight calculated using APLS formula “Weight in kg = (2 × age in years) + 8” and novel formula “Weight in kg = (0.176 × age in months) + 7.241” Flannigan, C., Bourke, T. W., Sproule, A., Stevenson, M., & Terris, M. (2014). Are APLS formulae for estimating weight appropriate for use in children admitted to PICU? Resuscitation, 85(7), p. 928.

FIGURE 3  Comparison of actual weight with weight calculated using APLS formula “Weight in kg = (3 × age in years) + 7” and novel formula “Weight in kg = (0.331 × age in months) − 6.868” Flannigan, C., Bourke, T. W., Sproule, A., Stevenson, M., & Terris, M. (2014). Are APLS formulae for estimating weight appropriate for use in children admitted to PICU? Resuscitation, 85(7), p. 929.

144

Study Questions

1. What are the variables on the x- and y-axes in Figure 1 from the Flannigan et al. (2014) study?

2. What is the name of the type of variable represented by x and y in Figure 1? Is x or y the score to be predicted?

3. What is the purpose of simple linear regression analysis and the regression equation?

4. What is the point where the regression line meets the y-axis called? Is there more than one term for this point and what is the value of x at that point?

5. In the formula y = bx + a, is a or b the slope? What does the slope represent in regression analysis?

6. Using the values a = 3.161 and b = 0.502 with the novel formula in Figure 1, what is the predicted weight in kilograms for a child at 5 months of age? Show your calculations.

145

7. What are the variables on the x-axis and the y-axis in Figures 2 and 3? Describe these variables and how they might be entered into the regression novel formulas identified in Figures 2 and 3.

8. Using the values a = 7.241 and b = 0.176 with the novel formula in Figure 2, what is the predicted weight in kilograms for a child at 4 years of age? Show your calculations.

9. Does Figure 1 have a positive or negative slope? Provide a rationale for your answer. Discuss the meaning of the slope of Figure 1.

10. According to the study narrative, why are estimated child weights important in a pediatric intensive care (PICU) setting? What are the implications of these findings for practice?

146

1. The x variable is age in months, and the y variable is weight in kilograms in Figure 1.

2. x is the independent or predictor variable. y is the dependent variable or the variable that is to be predicted by the independent variable, x.

3. Simple linear regression is conducted to estimate or predict the values of one dependent variable based on the values of one independent variable. Regression analysis is used to calculate a line of best fit based on the relationship between the independent variable x and the dependent variable y. The formula developed with regression analysis can be used to predict the dependent variable (y) values based on values of the independent variable x.

4. The point where the regression line meets the y-axis is called the y intercept and is also represented by a (see Figure 14-1). a is also called the regression constant. At the y intercept, x = 0.

5. b is the slope of the line of best fit (see Figure 14-1). The slope of the line indicates the amount of change in y for each one unit of change in x. b is also called the regression coefficient.

6. Use the following formula to calculate your answer: y = bx + ay = 0.502 (5) + 3.161 = 2.51 + 3.161 = 5.671 kilogramsNote: Flannigan et al. (2014) expressed the novel formula of weight in kilograms = (0.502 × age in months) + 3.161 in the title of Figure 1.

7. Age in years is displayed on the x-axis and is used for the APLS UK formulas in Figures 2 and 3. Figure 2 includes children 1 to 5 years of age, and Figure 3 includes children 6 to 12 years of age. However, the novel formulas developed by simple linear regression are calculated with age in months. Therefore, the age in years must be converted to age in months before calculating the y values with the novel formulas provided for Figures 2 and 3. For example, a child who is 2 years old would be converted to 24 months (2 × 12 mos./year = 24 mos.). Then the formulas in Figures 2 and 3 could be used to predict y (weight in kilograms) for the different aged children. The y-axis on both Figures 2 and 3 is weight in kilograms (kg).

8. First calculate the child’s age in months, which is 4 × 12 months/year = 48 months.y = bx + a = 0.176 (48) + 7.241 = 8.448 + 7.241 = 15.689 kilogramsNote the x value needs to be in age in months and Flannigan et al. (2014) expressed the novel formula of weight in kilograms = (0.176 × age in months) + 7.241.

147

9. Figure 1 has a positive slope since the line extends from the lower left corner to the upper right corner and shows a positive relationship. This line shows that the increase in x (independent variable) is associated with an increase in y (dependent variable). In the Flannigan et al. (2014) study, the independent variable age in months is used to predict the dependent variable of weight in kilograms. As the age in months increases, the weight in kilograms also increases, which is the positive relationship illustrated in Figure 1.

149

EXERCISE 14 Questions to Be Graded

Name: _______________________________________________________ Class: _____________________

Date: ___________________________________________________________________________________

1. According to the study narrative and Figure 1 in the Flannigan et al. (2014) study, does the APLS UK formula under- or overestimate the weight of children younger than 1 year of age? Provide a rationale for your answer.

2. Using the values a = 3.161 and b = 0.502 with the novel formula in Figure 1, what is the predicted weight in kilograms (kg) for a child at 9 months of age? Show your calculations.

3. Using the values a = 3.161 and b = 0.502 with the novel formula in Figure 1, what is the predicted weight in kilograms for a child at 2 months of age? Show your calculations.

4. In Figure 2, the formula for calculating y (weight in kg) is Weight in kg = (0.176 × Age in months) + 7.241. Identify the y intercept and the slope in this formula.

150

5. Using the values a = 7.241 and b = 0.176 with the novel formula in Figure 2, what is the predicted weight in kilograms for a child 3 years of age? Show your calculations.

6. Using the values a = 7.241 and b = 0.176 with the novel formula in Figure 2, what is the predicted weight in kilograms for a child 5 years of age? Show your calculations.

7. In Figure 3, some of the actual mean weights represented by blue line with squares are above the dotted straight line for the novel formula, but others are below the straight line. Is this an expected finding? Provide a rationale for your answer.

8. In Figure 3, the novel formula is (weight in kilograms = (0.331 × Age in months) − 6.868. What is the predicted weight in kilograms for a child 10 years old? Show your calculations.

9. Was the sample size of this study adequate for conducting simple linear regression? Provide a rationale for your answer.

10. Describe one potential clinical advantage and one potential clinical problem with using the three novel formulas presented in Figures 1, 2, and 3 in a PICU setting.

(Grove 139-150)

Grove, Susan K., Daisha Cipher. Statistics for Nursing Research: A Workbook for Evidence-Based Practice, 2nd Edition. Saunders, 022016. VitalBook file.

The citation provided is a guideline. Please check each citation for accuracy before use.

Exercise 19

Understanding Pearson Chi-Square

Statistical Technique in Review

The Pearson Chi-square (χ2) is an inferential statistical test calculated to examine differences among groups with variables measured at the nominal level. There are different types of χ2 tests and the Pearson chi-square is commonly reported in nursing studies. The Pearson χ2 test compares the frequencies that are observed with the frequencies that were expected. The assumptions for the χ2 test are as follows:

1. The data are nominal-level or frequency data.

2. The sample size is adequate.

3. The measures are independent of each other or that a subject’s data only fit into one category (Plichta & Kelvin, 2013).

The χ2 values calculated are compared with the critical values in the χ2 table (see Appendix D Critical Values of the χ2 Distribution at the back of this text). If the result is greater than or equal to the value in the table, significant differences exist. If the values are statistically significant, the null hypothesis is rejected (Grove, Burns, & Gray, 2013). These results indicate that the differences are probably an actual reflection of reality and not just due to random sampling error or chance.

In addition to the χ2 value, researchers often report the degrees of freedom (df). This mathematically complex statistical concept is important for calculating and determining levels of significance. The standard formula for df is sample size (N) minus 1, or df = N − 1; however, this formula is adjusted based on the analysis technique performed (Plichta & Kelvin, 2013). The df formula for the χ2 test varies based on the number of categories examined in the analysis. The formula for df for the two-way χ2 test is df = (R − 1) (C − 1), where R is number of rows and C is the number of columns in a χ2 table. For example, in a 2 × 2 χ2 table, df = (2 − 1) (2 − 1) = 1. Therefore, the df is equal to 1. Table 19-1 includes a 2 × 2 chi-square contingency table based on the findings of An et al. (2014) study. In Table 19-1, the rows represent the two nominal categories of alcohol 192use and alcohol nonuse and the two columns represent the two nominal categories of smokers and nonsmokers. The df = (2 − 1) (2 − 1) = (1) (1) = 1, and the study results were as follows: χ2 (1, N = 799) = 63.1; p < 0.0001. It is important to note that the df can also be reported without the sample size, as in χ2(1) = 63.1, p < 0.0001.

TABLE 19-1

CONTINGENCY TABLE BASED ON THE RESULTS OF AN ET AL. (2014) STUDY Nonsmokers n = 742Smokers n = 57*No alcohol use55114Alcohol use†19143

*Smokers defined as “smoking at least 1 cigarette daily during the past month.”

†Alcohol use “defined as at least 1 alcoholic beverage per month during the past year.”

An, F. R., Xiang, Y. T., Yu., L., Ding, Y. M., Ungvari, G. S., Chan, S. W. C., et al. (2014). Prevalence of nurses’ smoking habits in psychiatric and general hospitals in China. Archives of Psychiatric Nursing, 28(2), 120.

If more than two groups are being examined, χ2 does not determine where the differences lie; it only determines that a statistically significant difference exists. A post hoc analysis will determine the location of the difference. χ2 is one of the weaker statistical tests used, and results are usually only reported if statistically significant values are found. The step-by-step process for calculating the Pearson chi-square test is presented in Exercise 35.

Research Article

Source

Darling-Fisher, C. S., Salerno, J., Dahlem, C. H. Y., & Martyn, K. K. (2014). The Rapid Assessment for Adolescent Preventive Services (RAAPS): Providers’ assessment of its usefulness in their clinical practice settings. Journal of Pediatric Health Care, 28(3), 217–226.

Introduction

Darling-Fisher and colleagues (2014, p. 219) conducted a mixed-methods descriptive study to evaluate the clinical usefulness of the Rapid Assessment for Adolescent Preventative Services (RAAPS) screening tool “by surveying healthcare providers from a wide variety of clinical settings and geographic locations.” The study participants were recruited from the RAAPS website to complete an online survey. The RAAPS risk-screening tool “was developed to identify the risk behaviors contributing most to adolescent morbidity, mortality, and social problems, and to provide a more streamlined assessment to help providers address key adolescent risk behaviors in a time-efficient and user-friendly format” (Darling-Fisher et al., 2014, p. 218). The RAAPS is an established 21-item questionnaire with evidence of reliability and validity that can be completed by adolescents in 5–7 minutes.

“Quantitative and qualitative analyses indicated the RAAPS facilitated identification of risk behaviors and risk discussions and provided efficient and consistent assessments; 86% of providers believed that the RAAPS positively influenced their practice” (Darling-Fisher et al., 2014, p. 217). The researchers concluded the use of RAAPS by healthcare providers could improve the assessment and identification of adolescents at risk and lead to the delivery of more effective adolescent preventive services.

Relevant Study Results

In the Darling-Fisher et al. (2014, p. 220) mixed-methods study, the participants (N = 201) were “providers from 26 U.S. states and three foreign countries (Canada, Korea, and Ireland).” More than half of the participants (n = 111; 55%) reported they were using the RAAPS in their clinical practices. “When asked if they would recommend the RAAPS to other providers, 86 responded, and 98% (n = 84) stated they would recommend RAAPS. The two most common reasons cited for their recommendation were for screening (n = 76, 92%) and identification of risk behaviors (n = 75, 90%). Improved communication (n = 52, 63%) and improved documentation (n = 46, 55%) and increased patient understanding of their risk behaviors (n = 48, 58%) were also cited by respondents as reasons to recommend the RAAPS” (Darling-Fisher et al., 2014, p. 222).

193

“Respondents who were not using the RAAPS (n = 90; 45%), had a variety of reasons for not using it. Most reasons were related to constraints of their health system or practice site; other reasons were satisfaction with their current method of assessment . . . and that they were interested in the RAAPS for academic or research purposes rather than clinical use” (Darling-Fisher et al., 2014, p. 220).

Chi-square analysis was calculated to determine if any statistically significant differences existed between the characteristics of the RAAPS users and nonusers. Darling-Fisher et al. (2014) did not provide a level of significance or α for their study, but the standard for nursing studies is α = 0.05. “Statistically significant differences were noted between RAAPS users and nonusers with respect to provider types, practice setting, percent of adolescent patients, years in practice, and practice region. No statistically significant demographic differences were found between RAAPS users and nonusers with respect to race, age” (Darling-Fisher et al., 2014, p. 221). The χ2 results are presented in Table 2.

TABLE 2

DEMOGRAPHIC COMPARISONS BETWEEN RAPID ASSESSMENT FOR ADOLESCENT PREVENTIVE SERVICE USERS AND NONUSERSCurrent userYes (%)No (%)χ2pProvider type (n = 161)  12.7652, df = 2< .00 Health care provider64 (75.3)55 (72.4) Mental health provider13 (15.3)2 (2.6) Other8 (9.4)19 (25.0)Practice setting (n = 152)  12.7652, df = 1< .00 Outpatient health clinic20 (24.1)36 (52.2) School-based health clinic63 (75.9)33 (47.8)% Adolescent patients (n = 154)  7.3780, df = 1.01 ≤50%26 (30.6)36 (52.2) >50%59 (69.4)33 (47.8)Years in practice (n = 157)  6.2597, df = 1.01 ≤5 years44 (51.8)23 (31.9) >5 years41 (48.2)49 (68.1)U.S. practice region (n = 151)  29.68, df = 3< .00 Northeastern United States13 (15.3)15 (22.7) Southern United States11 (12.9)22 (33.3) Midwestern United States57 (67.1)16 (24.2) Western United States4 (4.7)13 (19.7)Race (n = 201)  1.2865, df = 2.53 Black/African American11 (9.9)5 (5.6) White/Caucasian66 (59.5)56 (62.2) Other34 (30.6)29 (32.2)Provider age in years (n = 145)  4.00, df = 2.14 20–39 years21 (25.6)8 (12.7) 40–49 years24 (29.3)19 (30.2) 50+ years37 (45.1)36 (57.1)

χ2, Chi-square statistic.

df, degrees of freedom.

Darling-Fisher, C. S., Salerno, J., Dahlem, C. H. Y., & Martyn, K. K. (2014). The Rapid Assessment for Adolescent Preventive Services (RAAPS): Providers’ assessment of its usefulness in their clinical practice settings. Journal of Pediatric Health Care, 28(3), p. 221.

194

Study Questions

1. What is the sample size for the Darling-Fisher et al. (2014) study? How many study participants (percentage) are RAAPS users and how many are RAAPS nonusers?

2. What is the chi-square (χ2) value and degrees of freedom (df) for provider type?

3. What is the p value for provider type? Is the χ2 value for provider type statistically significant? Provide a rationale for your answer.

4. Does a statistically significant χ2 value provide evidence of causation between the variables? Provide a rationale for your answer.

5. What is the χ2 value for race? Is the χ2 value statistically significant? Provide a rationale for your answer.

6. Is there a statistically significant difference between RAAPS users and RAAPS nonusers with regard to percentage adolescent patients? In your own opinion is this an expected finding? Document your answer.

195

7. What is the df for U.S. practice region? Complete the df formula for U.S. practice region to visualize how Darling-Fisher et al. (2014) determined the appropriate df for that region.

8. State the null hypothesis for the years in practice variable for RAAPS users and RAAPS nonusers.

9. Should the null hypothesis for years in practice developed for Question 8 be accepted or rejected? Provide a rationale for your answer.

10. How many null hypotheses were accepted by Darling-Fisher et al. (2014) in Table 2? Provide a rationale for your answer.

196

1. The sample size is N = 201 with n = 111 (55%) RAAPS users and n = 90 (45%) RAAPS nonusers as indicated in the narrative results.

2. The χ2 = 12.7652 and df = 2 for provider type as presented in Table 2.

3. The p = < .00 for the provider type. Yes, the χ2 = 12.7652 for provider type is statistically significant as indicated by the p value presented in Table 2. The specific χ2 value obtained could be compared against the critical value in a χ2 table (see Appendix D Critical Values of the χ2 Distribution at the back of this text) to determine the significance for the specific degrees of freedom (df), but readers of research reports usually rely on the p value provided by the researcher(s) to determine significance. Most nurse researchers set the level of significance or alpha (α) = 0.05. Since the p value is less than alpha, the result is statistically significant. You need to note that p values never equal zero as they appear in this study. The p values would not be zero if carried out more decimal places.

4. No, a statistically significant χ2 value does not provide evidence of causation. A statistically significant χ2 value indicates a significant difference between groups exists but does not provide a causal link (Grove et al., 2013; Plichta & Kelvin, 2013).

5. The χ2 = 1.2865 for race. Since p = .53 for race, the χ2 value is not statistically significant. The level of significance is set at α = 0.05 and the p value is larger than alpha, so the result is nonsignificant.

6. Yes, there is a statistically significant difference between RAAPS users and RAAPS nonusers with regard to percent of adolescent patients. The chi-square value = 7.3780 with a p = .01.You might expect that nurses caring for more adolescents might have higher RAAPS use as indicated in Table 2. However, nurses need to be knowledgeable of assessment and care needs of populations and subpopulations in their practice even if not frequently encountered. Two valuable sources for adolescent care include the Centers for Disease Control and Prevention (CDC) Adolescent and School Health at http://www.cdc.gov/HealthyYouth/idex.htm and the World Health Organization (WHO) adolescent health at http://www.who.int/topics/adolescent_health/en/.

7. The df = 3 for U.S. practice region is provided in Table 2. The df formula, df = (R − 1) (C − 1) is used. There are four “R” rows, Northeastern United States, Southern United States, Midwestern United States, and Western United States. There are two “C” columns, RAAPS users and RAAPS nonusers. df = (4 − 1)(2 − 1) = (3)(1) = 3.

8. The null hypothesis: There is no difference between RAAPS users and RAAPS nonusers for providers with ≤5 years of practice and those with >5 years of practice.

197

9. The null hypothesis for years in practice stated in Questions 8 should be rejected. The χ2 = 6.2597 for years in practice is statistically significant, p = .01. A statistically significant χ2 indicates a significant difference exists between the users and nonusers of RAAPS for years in practice; therefore, the null hypothesis should be rejected.

10. Two null hypotheses were accepted since two χ2 values (race and provider age) were not statistically significant (p > 0.05), as indicated in Table 2. Nonsignificant results indicate that the null hypotheses are supported or accepted as an accurate reflection of the results of the study.

199

EXERCISE 19 Questions to Be Graded

Name: _______________________________________________________ Class: _____________________

Date: ___________________________________________________________________________________

1. According to the relevant study results section of the Darling-Fisher et al. (2014) study, what categories are reported to be statistically significant?

2. What level of measurement is appropriate for calculating the χ2 statistic? Give two examples from Table 2 of demographic variables measured at the level appropriate for χ2.

3. What is the χ2 for U.S. practice region? Is the χ2 value statistically significant? Provide a rationale for your answer.

4. What is the df for provider type? Provide a rationale for why the df for provider type presented in Table 2 is correct.

200

5. Is there a statistically significant difference for practice setting between the Rapid Assessment for Adolescent Preventive Services (RAAPS) users and nonusers? Provide a rationale for your answer.

6. State the null hypothesis for provider age in years for RAAPS users and RAAPS nonusers.

7. Should the null hypothesis for provider age in years developed for Question 6 be accepted or rejected? Provide a rationale for your answer.

8. Describe at least one clinical advantage and one clinical challenge of using RAAPS as described by Darling-Fisher et al. (2014).

9. How many null hypotheses are rejected in the Darling-Fisher et al. (2014) study for the results presented in Table 2? Provide a rationale for your answer.

10. A statistically significant difference is present between RAAPS users and RAAPS nonusers for U.S. practice region, χ2 = 29.68. Does the χ2 result provide the location of the difference? Provide a rationale for your answer

(Grove 191-200)

Grove, Susan K., Daisha Cipher. Statistics for Nursing Research: A Workbook for Evidence-Based Practice, 2nd Edition. Saunders, 022016. VitalBook file.

The citation provided is a guideline. Please check each citation for accuracy before use.

Exercise 29

Calculating Simple Linear Regression

Simple linear regression is a procedure that provides an estimate of the value of a dependent variable (outcome) based on the value of an independent variable (predictor). Knowing that estimate with some degree of accuracy, we can use regression analysis to predict the value of one variable if we know the value of the other variable (Cohen & Cohen, 1983). The regression equation is a mathematical expression of the influence that a predictor has on a dependent variable, based on some theoretical framework. For example, in Exercise 14, Figure 14-1 illustrates the linear relationship between gestational age and birth weight. As shown in the scatterplot, there is a strong positive relationship between the two variables. Advanced gestational ages predict higher birth weights.

A regression equation can be generated with a data set containing subjects’ x and y values. Once this equation is generated, it can be used to predict future subjects’ y values, given only their x values. In simple or bivariate regression, predictions are made in cases with two variables. The score on variable y (dependent variable, or outcome) is predicted from the same subject’s known score on variable x (independent variable, or predictor).

Research Designs Appropriate for Simple Linear Regression

Research designs that may utilize simple linear regression include any associational design (Gliner et al., 2009). The variables involved in the design are attributional, meaning the variables are characteristics of the participant, such as health status, blood pressure, gender, diagnosis, or ethnicity. Regardless of the nature of variables, the dependent variable submitted to simple linear regression must be measured as continuous, at the interval or ratio level.

Statistical Formula and Assumptions

Use of simple linear regression involves the following assumptions (Zar, 2010):

1. Normal distribution of the dependent (y) variable

2. Linear relationship between x and y

3. Independent observations

4. No (or little) multicollinearity

5. Homoscedasticity

320

Data that are homoscedastic are evenly dispersed both above and below the regression line, which indicates a linear relationship on a scatterplot. Homoscedasticity reflects equal variance of both variables. In other words, for every value of x, the distribution of y values should have equal variability. If the data for the predictor and dependent variable are not homoscedastic, inferences made during significance testing could be invalid (Cohen & Cohen, 1983; Zar, 2010). Visual examples of homoscedasticity and heteroscedasticity are presented in Exercise 30.

In simple linear regression, the dependent variable is continuous, and the predictor can be any scale of measurement; however, if the predictor is nominal, it must be correctly coded. Once the data are ready, the parameters a and b are computed to obtain a regression equation. To understand the mathematical process, recall the algebraic equation for a straight line:

y=bx+a

where

y=the dependent variable(outcome)

x=the independent variable(predictor)

b=the slope of the line

a=y-intercept(the point where the regression line intersects the y-axis)

No single regression line can be used to predict with complete accuracy every y value from every x value. In fact, you could draw an infinite number of lines through the scattered paired values (Zar, 2010). However, the purpose of the regression equa­tion is to develop the line to allow the highest degree of prediction possible—the line of best fit. The procedure for developing the line of best fit is the method of least squares. The formulas for the beta (β) and slope (α) of the regression equation are computed as follows. Note that once the β is calculated, that value is inserted into the formula for α.

β=n∑xy−∑x∑yn∑x 2 −(∑x) 2

α=∑y−b∑xn

Hand Calculations

This example uses data collected from a study of students enrolled in a registered nurse to bachelor of science in nursing (RN to BSN) program (Mancini, Ashwill, & Cipher, 2014). The predictor in this example is number of academic degrees obtained by the student prior to enrollment, and the dependent variable was number of months it took for the student to complete the RN to BSN program. The null hypothesis is “Number of degrees does not predict the number of months until completion of an RN to BSN program.”

The data are presented in Table 29-1. A simulated subset of 20 students was selected for this example so that the computations would be small and manageable. In actuality, studies involving linear regression need to be adequately powered (Aberson, 2010; Cohen, 1988). Observe that the data in Table 29-1 are arranged in columns that correspond to 321the elements of the formula. The summed values in the last row of Table 29-1 are inserted into the appropriate place in the formula for b.

TABLE 29-1

ENROLLMENT GPA AND MONTHS TO COMPLETION IN AN RN TO BSN PROGRAMStudent IDxyx2xy(Number of Degrees)(Months to Completion)1117117229418301700419195016006111111701500801200911511510112112111141141211011013117117140200015294181621242417114114182104201911711720211422sum Σ2026730238

The computations for the b and α are as follows:

Step 1: Calculate b.From the values in Table 29-1, we know that n = 20, Σx = 20, Σy = 267, Σx2 = 30, and Σxy = 238. These values are inserted into the formula for b, as follows:

b=20(238)−(20)(267)20(30)−20 2

b=−2.9

Step 2: Calculate α.From Step 1, we now know that b = −2.9, and we plug this value into the formula for α.

α=267−(−2.9)(20)20

α=16.25

Step 3: Write the new regression equation:

y=−2.9x+16.25

322

Step 4: Calculate R.The multiple R is defined as the correlation between the actual y values and the predicted y values using the new regression equation. The predicted y value using the new equation is represented by the symbol ŷ to differentiate from y, which represents the actual y values in the data set. We can use our new regression equation from Step 3 to compute predicted program completion time in months for each student, using their number of academic degrees prior to enrollment in the RN to BSN Program. For example, Student #1 had earned 1 academic degree prior to enrollment, and the predicted months to completion for Student 1 is calculated as:

y ̂ =−2.9(1)+16.25

y ̂ =13.35

Thus, the predicted ŷ is 13.35 months. This procedure would be continued for the rest of the students, and the Pearson correlation between the actual months to completion (y) and the predicted months to completion (ŷ) would yield the multiple R value. In this example, the R = 0.638. The higher the R, the more likely that the new regression equation accurately predicts y, because the higher the correlation, the closer the actual y values are to the predicted ŷ values. Figure 29-1 displays the regression line where the x axis represents possible numbers of degrees, and the y axis represents the predicted months to program completion (ŷ values).

FIGURE 29-1  REGRESSION LINE REPRESENTED BY NEW REGRESSION EQUATION.

Step 5: Determine whether the predictor significantly predicts y.

t=Rn−21−R 2   ‾ ‾ ‾ ‾  √

To know whether the predictor significantly predicts y, the beta must be tested against zero. In simple regression, this is most easily accomplished by using the R value from Step 4:

t=.638200−21−.407  ‾ ‾ ‾ ‾ ‾  √

t=3.52

323

The t value is then compared to the t probability distribution table (see Appendix A). The df for this t statistic is n − 2. The critical t value at alpha (α) = 0.05, df = 18 is 2.10 for a two-tailed test. Our obtained t was 3.52, which exceeds the critical value in the table, thereby indicating a significant association between the predictor (x) and outcome (y).

Step 6: Calculate R2.After establishing the statistical significance of the R value, it must subsequently be examined for clinical importance. This is accomplished by obtaining the coefficient of determination for regression—which simply involves squaring the R value. The R2 represents the percentage of variance explained in y by the predictor. Cohen describes R2 values of 0.02 as small, 0.15 as moderate, and 0.26 or higher as large effect sizes (Cohen, 1988). In our example, the R was 0.638, and, therefore, the R2 was 0.407. Multiplying 0.407 × 100% indicates that 40.7% of the variance in months to program completion can be explained by knowing the student’s number of earned academic degrees at admission (Cohen & Cohen, 1983).The R2 can be very helpful in testing more than one predictor in a regression model. Unlike R, the R2 for one regression model can be compared with another regression model that contains additional predictors (Cohen & Cohen, 1983). The R2 is discussed further in Exercise 30.The standardized beta (β) is another statistic that represents the magnitude of the association between x and y. β has limits just like a Pearson r, meaning that the standardized β cannot be lower than −1.00 or higher than 1.00. This value can be calculated by hand but is best computed with statistical software. The standardized beta (β) is calculated by converting the x and y values to z scores and then correlating the x and y value using the Pearson r formula. The standardized beta (β) is often reported in literature instead of the unstandardized b, because b does not have lower or upper limits and therefore the magnitude of b cannot be judged. β, on the other hand, is interpreted as a Pearson r and the descriptions of the magnitude of β can be applied, as recommended by Cohen (1988). In this example, the standardized beta (β) is −0.638. Thus, the magnitude of the association between x and y in this example is considered a large predictive association (Cohen, 1988).

324

SPSS Computations

This is how our data set looks in SPSS.

Step 1: From the “Analyze” menu, choose “Regression” and “Linear.”

Step 2: Move the predictor, Number of Degrees, to the space labeled “Independent(s).” Move the dependent variable, Number of Months to Completion, to the space labeled “Dependent.” Click “OK.”

325

Interpretation of SPSS Output

The following tables are generated from SPSS. The first table contains the multiple R and the R2 values. The multiple R is 0.638, indicating that the correlation between the actual y values and the predicted y values using the new regression equation is 0.638. The R2 is 0.407, indicating that 40.7% of the variance in months to program completion can be explained by knowing the student’s number of earned academic degrees at enrollment.

Regression

The second table contains the ANOVA table. As presented in Exercises 18 and 33, the ANOVA is usually performed to test for differences between group means. However, ANOVA can also be performed for regression, where the null hypothesis is that “knowing the value of x explains no information about y”. This table indicates that knowing the value of x explains a significant amount of variance in y. The contents of the ANOVA table are rarely reported in published manuscripts, because the significance of each predictor is presented in the last SPSS table titled “Coefficients” (see below).

The third table contains the b and a values, standardized beta (β), t, and exact p value. The a is listed in the first row, next to the label “Constant.” The β is listed in the second row, next to the name of the predictor. The remaining information that is important to extract when interpreting regression results can be found in the second row. The standardized beta (β) is −0.638. This value has limits just like a Pearson r, meaning that the standardized β cannot be lower than −1.00 or higher than 1.00. The t value is −3.516, and the exact p value is 0.002.

326

Final Interpretation in American Psychological Association (APA) Format

The following interpretation is written as it might appear in a research article, formatted according to APA guidelines (APA, 2010). Simple linear regression was performed with number of earned academic degrees as the predictor and months to program completion as the dependent variable. The student’s number of degrees significantly predicted months to completion among students in an RN to BSN program, β = −0.638, p = 0.002, and R2 = 40.7%. Higher numbers of earned academic degrees significantly predicted shor