Assignment 2 Economics 31 Fall 1999
Regression: Calculation and Analysis, Simple, Multiple, Dummies
Reading Assignment: Mirer Chpt. 5, pp.105-111, pp.132-151.
The objective of the first part of this assignment is to have you
calculate
directly the coefficients of some least-squares regressions and
to
interpret their meaning. You should be able to calculate the
coefficients
asked for with the information you are given, using the formulae
presented
in the text, lectures, and the Notes.
Table 1 presents some data on aggregate features of the U.S.
economy from
1959 through 1997. Table 2, which you will use
to estimate the least square regressions, presents the sums of
squares and
cross products of these data as well as, in the final row, the sums
of the
variables over the 20 years.
1. Using the information provided, using the method of least-squares, estimate:
a. A linear equation with consumption as the dependent variable
and GDP as
the independent variable.
b. A linear equation with consumption as the dependent variable
and the
interest rate as the independent variable.
c. A linear regression with GDP as the dependent variable and the
interest
rate as the independent variable
For all parts present the formulae you used and the numerical
estimates for
each coefficient.
d. Give an economic interpretation of the coefficients you find
for each of
the linear regressions.
2. In light of the information from 1a and b. and c., indicate
what sort of
result you think you would get if you estimated a multiple regression
with
consumption as the dependent variable and GDP as one independent
variable
and the interest rate as a second independent variable. (You could
estimate
the multiple regression coefficients with the information given in
the
table though it would be a bit tedious. Do it if you wish, but you
can just
make some qualitative guesses as to how the multiple regression
coefficients would differ from the simple regression coefficients.)
Give an
economic interpretation of the multiple regression results.
In table 3, the data from table 1 have been converted into real
terms by
dividing consumption and GDP by the price deflator and calculating a
real
interest rate by subtracting the rate of increase of the deflator
from the
interest rate. Table 4, which again you will use for estimating the
least
squares regressions, shows the sums of squares and cross -products
for the
real variables as well as their sums.
3. Calculate the simple regressions relating real consumption
first to the
real GDP and second to the real interest rate. Give your numerical
results
for these coefficients. Comment on any difference from the results
using
the nominal values from Tables 1 and 2 and using the real values of
Tables
3 and 4.
4. What dummy variable, if any, would be a logical addition for
the
equation in problem 3 relating real consumption to real GDP?
Supposing
there were an applicable dummy variable, how would you calculate
its
coefficient?
5. For extra credit, derive the expression for the simple
regression
coefficient, beginning with the normal equations. Explain each
mathematical
step in your own words.
Part 2 Economics Fall 1999
Regressions: Simple, Multiple, Dummies
This part of the assignment will require the use of a statistical
software
program. The accompanying handout explains how to use the STATA
statistical
analysis program which is available through the college network. For
an
additional resource with indepth instructions, see
http://www.princeton.edu/~data/datalib/datalib.html
or
http://www.stata.com/info/session/
.
The data file you will be using is titled cps22an. This is an
extract of
574 cases taken from the current population survey which is done
every
month in order to estimate the unemployment rate. This extract
includes
only individuals between the ages of 25 and 99 who worked at least
once
during 1997.
The data file will be available as a STATA file entitled cps22an.
Here are
the variables in this file.
FILENAME: cps22an
Source: Extract of March 1998 CPS
1. ED = years of education
2. SOUTH = 1 if lives in south
3. FE = 1 if female
4. MARR = 1 if married with spouse present (in household)
5. WID= 1 if widowed
6. DIV=1 if divorced and no spouse present
7. EX = years of labor market experience (= AGE-ED-6)
8. HOURWG=average hourly wage in 1997
9. EXSQ = years of labor market experience squared
10. AGE = age in years
11. AMIND=1 if worker is of Native American ancestry
12. MANUF = 1 if working in manufacturing industry for longest job in
1997
13. CONST = 1 if working in construction industry for longest job in
1997
14. RETAIL = 1 if working in retail/wholesale trade for longest job
in 1997
15. AGFOR= 1 if working in agriculture/forestry/fishing for longest
job in
1997
16. FINANCE = 1 if working in finance, insurance, or real estate
for
longest job in 1997
17. VET=1 if worker is a veteran
18. SERV=1 if worker if working in personal, entertainment, or
professional
services for longest job in 1997
19. BLACK=1 if race of person is Black
20.ASIAN=1 if race of person is Asian
21. HISP=1 if person is of Hispanic origen
22. TOTALWGS=total wages in 1997 (including self employment and farm
income)
Number of Observations: 574
I. Simple Regression
a. Using the data in the file estimate the linear relationship between the
hourly wage, and the years of education as an independent variable. Show
your results.
b. Give an economic interpretation of each of the coefficients
c. Estimate the linear relationship between hourly wage as a
dependent and
age as an independent variable. Give an economic interpretation of
thecoefficients.
d. Estimate the linear relationship between education and age.
2. Multiple Regression
a. Estimate the linear relationship between hourly wage, education
and age.
Show your results.
b. Write down the formula for each of the coefficients.
c. Give an economic interpretation of each of the coefficients.
d. Compare the coefficient on education which you obtained in the
simple
regression with the coefficient for education which you obtain in the
multiple
regression and give both a statistical and an economic or social
explanation of why the
value of the coefficent changed the way it did (you can use the
results of part 1.d. to
help you with this).
3. Dummy Variable Regressions
a. Estimate the linear relationship between the hourly wage and MARR. Show your results.
b. Give an economic interpretation of each of the coefficients, including the constant.
c. Estimate the linear relationship between the hourly wage as
dependent
and MARR and female as independent variables. Show your results.
d. Give an economic interpretation of each coefficient. Explain
the change
in the coefficient for MARR from the first to the second
regresson.
e. Create a new dummy variable for female MARR and run the
linear
regression with hourly wage as the dependent variable and MARR,
female, and
female*MARR as independent variables (to make a new variable in
STATA
. generate female*MARR=FE*MARR
Discuss the meaning of the coefficients you get. Do these
coefficients make
sense to you?
4. Mixed Continuous and Dummy Variables
a.Estimate the linear relationship with hourly wage as the
dependent
variable and education and female as independent variables. Interpret
the
coefficients you get.
b.Create a new variable, call it FMED, which is the product of the
variable
for female and the variable for education and rerun the regression
you just
did in part a but add this new variable as an independent variable.
Discuss
the meaning of all the coefficients you get, including the constant
term.
5. Create 3 new dummy variables for education, one less than 12
years of
education, one for 12 to 15 years of education, one for more than 16
years
of education. Here's the command for creating the variable for
education
less than 12 years:
. generate EDLT12=ED <12
for more than 12 but less than 16 variable:
.generate EDHSG=ED>11 & ED<16
for 16 or more:
.generate EDCOL=ED>15
a.Estimate the relationship between the hourly wage and these levels of
education. Note that when you run the regression you must leave out the
variable for one of the categories of education (see the Notes on dummy
variables).b. Interpret the coefficients you get.
6. Try using any of the other variables to estimate relationships
and
interpret the meaning of the results you obtain.