Assignment 2 Economics 31 Fall 1999

Regression: Calculation and Analysis, Simple, Multiple, Dummies

Reading Assignment: Mirer Chpt. 5, pp.105-111, pp.132-151.

The objective of the first part of this assignment is to have you calculate
directly the coefficients of some least-squares regressions and to
interpret their meaning. You should be able to calculate the coefficients
asked for with the information you are given, using the formulae presented
in the text, lectures, and the Notes.

Table 1 presents some data on aggregate features of the U.S. economy from
1959 through 1997. Table 2, which you will use
to estimate the least square regressions, presents the sums of squares and
cross products of these data as well as, in the final row, the sums of the
variables over the 20 years.

Link to Tables

1. Using the information provided, using the method of least-squares, estimate:

a. A linear equation with consumption as the dependent variable and GDP as
the independent variable.

b. A linear equation with consumption as the dependent variable and the
interest rate as the independent variable.

c. A linear regression with GDP as the dependent variable and the interest
rate as the independent variable

For all parts present the formulae you used and the numerical estimates for
each coefficient.

d. Give an economic interpretation of the coefficients you find for each of
the linear regressions.

2. In light of the information from 1a and b. and c., indicate what sort of
result you think you would get if you estimated a multiple regression with
consumption as the dependent variable and GDP as one independent variable
and the interest rate as a second independent variable. (You could estimate
the multiple regression coefficients with the information given in the
table though it would be a bit tedious. Do it if you wish, but you can just
make some qualitative guesses as to how the multiple regression
coefficients would differ from the simple regression coefficients.) Give an
economic interpretation of the multiple regression results.

In table 3, the data from table 1 have been converted into real terms by
dividing consumption and GDP by the price deflator and calculating a real
interest rate by subtracting the rate of increase of the deflator from the
interest rate. Table 4, which again you will use for estimating the least
squares regressions, shows the sums of squares and cross -products for the
real variables as well as their sums.

3. Calculate the simple regressions relating real consumption first to the
real GDP and second to the real interest rate. Give your numerical results
for these coefficients. Comment on any difference from the results using
the nominal values from Tables 1 and 2 and using the real values of Tables
3 and 4.

4. What dummy variable, if any, would be a logical addition for the
equation in problem 3 relating real consumption to real GDP? Supposing
there were an applicable dummy variable, how would you calculate its
coefficient?

5. For extra credit, derive the expression for the simple regression
coefficient, beginning with the normal equations. Explain each mathematical
step in your own words.

Part 2 Economics Fall 1999

Regressions: Simple, Multiple, Dummies

This part of the assignment will require the use of a statistical software
program. The accompanying handout explains how to use the STATA statistical
analysis program which is available through the college network. For an
additional resource with indepth instructions, see
http://www.princeton.edu/~data/datalib/datalib.html or
http://www.stata.com/info/session/ .

The data file you will be using is titled cps22an. This is an extract of
574 cases taken from the current population survey which is done every
month in order to estimate the unemployment rate. This extract includes
only individuals between the ages of 25 and 99 who worked at least once
during 1997.

The data file will be available as a STATA file entitled cps22an. Here are
the variables in this file.

FILENAME: cps22an

Source: Extract of March 1998 CPS

1. ED = years of education
2. SOUTH = 1 if lives in south
3. FE = 1 if female
4. MARR = 1 if married with spouse present (in household)
5. WID= 1 if widowed
6. DIV=1 if divorced and no spouse present
7. EX = years of labor market experience (= AGE-ED-6)
8. HOURWG=average hourly wage in 1997
9. EXSQ = years of labor market experience squared
10. AGE = age in years
11. AMIND=1 if worker is of Native American ancestry
12. MANUF = 1 if working in manufacturing industry for longest job in 1997
13. CONST = 1 if working in construction industry for longest job in 1997
14. RETAIL = 1 if working in retail/wholesale trade for longest job in 1997
15. AGFOR= 1 if working in agriculture/forestry/fishing for longest job in
1997
16. FINANCE = 1 if working in finance, insurance, or real estate for
longest job in 1997
17. VET=1 if worker is a veteran
18. SERV=1 if worker if working in personal, entertainment, or professional
services for longest job in 1997
19. BLACK=1 if race of person is Black
20.ASIAN=1 if race of person is Asian
21. HISP=1 if person is of Hispanic origen
22. TOTALWGS=total wages in 1997 (including self employment and farm income)

Number of Observations: 574

I. Simple Regression

a. Using the data in the file estimate the linear relationship between the
hourly wage, and the years of education as an independent variable. Show
your results.

b. Give an economic interpretation of each of the coefficients

c. Estimate the linear relationship between hourly wage as a dependent and
age as an independent variable. Give an economic interpretation of thecoefficients.

d. Estimate the linear relationship between education and age.

 

2. Multiple Regression

a. Estimate the linear relationship between hourly wage, education and age.
Show your results.

b. Write down the formula for each of the coefficients.

c. Give an economic interpretation of each of the coefficients.

d. Compare the coefficient on education which you obtained in the simple
regression with the coefficient for education which you obtain in the multiple
regression and give both a statistical and an economic or social explanation of why the
value of the coefficent changed the way it did (you can use the results of part 1.d. to
help you with this).

3. Dummy Variable Regressions

a. Estimate the linear relationship between the hourly wage and MARR. Show your results.

b. Give an economic interpretation of each of the coefficients, including the constant.

c. Estimate the linear relationship between the hourly wage as dependent
and MARR and female as independent variables. Show your results.

d. Give an economic interpretation of each coefficient. Explain the change
in the coefficient for MARR from the first to the second regresson.

e. Create a new dummy variable for female MARR and run the linear
regression with hourly wage as the dependent variable and MARR, female, and
female*MARR as independent variables (to make a new variable in STATA

. generate female*MARR=FE*MARR

Discuss the meaning of the coefficients you get. Do these coefficients make
sense to you?

4. Mixed Continuous and Dummy Variables

a.Estimate the linear relationship with hourly wage as the dependent
variable and education and female as independent variables. Interpret the
coefficients you get.

b.Create a new variable, call it FMED, which is the product of the variable
for female and the variable for education and rerun the regression you just
did in part a but add this new variable as an independent variable. Discuss
the meaning of all the coefficients you get, including the constant term.

5. Create 3 new dummy variables for education, one less than 12 years of
education, one for 12 to 15 years of education, one for more than 16 years
of education. Here's the command for creating the variable for education
less than 12 years:

. generate EDLT12=ED <12

for more than 12 but less than 16 variable:

.generate EDHSG=ED>11 & ED<16

for 16 or more:

.generate EDCOL=ED>15

 

a.Estimate the relationship between the hourly wage and these levels of
education. Note that when you run the regression you must leave out the
variable for one of the categories of education (see the Notes on dummy
variables).

b. Interpret the coefficients you get.

6. Try using any of the other variables to estimate relationships and
interpret the meaning of the results you obtain.

Return to Econ 31