These days teachers find themselves swept up in the cross currents of an education debate about how to evaluate and pay teachers that is more polarizing and ugly by the day. Some days the debate generates much more heat than light, and this topic is greatly in need of illumination. Without at doubt, change is needed. The single salary schedule, which mandates the same salaries for teachers regardless of field, creates shortages of math, science, and special education teachers, and prevents some of the best from entering teaching. It makes sense for “star” teachers to earn more at the same level of experience. Tenure ought not to endlessly protect teachers who end up performing poorly or worse. Yet, that doesn’t mean that every proposed solution is a good idea; and some could turn things from bad to worse. Teachers should take the lead in helping the public understand their jobs and what works. For example, explaining how teaching is not just a matter of each teacher in her own classroom working independently and the many ways in which test scores really don’t capture all that is important about a child’s education.
The most recent example in need of illumination is a proposal released this week by New Jersey’s acting Education Commissioner Christopher Cerf. I find the basic ideas he proposes a pretty good list: more nuanced evaluations of teachers, no moving poorly performing teachers into “less important” teaching positions rather than out of schools altogether, and more clear reliance on supervisor observations and children’s learning to evaluate teachers. Doing this well is not going to be easy, however. Simply calling for teachers to be judged on value-added (VA) evaluations won’t do the job. Broadly speaking, VA calls for using student test scores in deciding how well teachers are doing. This approach has already become policy in some districts and it is beginning to affect which teachers stay, which teachers go, and whether they get a raise. Yet it is highly questionable whether any progress in improving the teacher corps can be made the way that VA is currently done. Too much depends on the children assigned to the teacher and our ability to correctly estimate the teacher’s contribution is far too weak. It matters who else teaches in the same school. First- and second-year teachers are early works in progress and their trajectory matters as much as the level of their performance.
The move to VA teacher evaluation across the country appears to be driven more by political agendas dedicated to blaming teachers and their unions rather than finding effective solutions. I don’t know what else could explain the way VA’s proponents gloss over the fact that as currently implemented it is built on a shaky scientific foundation. However, given this problem it seems likely that public servants like New Jersey’s Commissioner have not been fully informed about the limitations, presenting an opportunity for more illumination to benefit the policy debate.
The VA method calls for estimating through analyses of standardized test scores how much any given teacher helps or hinders the academic progress of students. The Los Angeles Times drew national attention to VA evaluation in a series in which an economist paid by the newspaper rated elementary school teachers according to this method, using a detailed set of data from the Los Angeles Unified School District. Based on this work, the Times then published a 6,000-name list of teachers and their ratings.
Enter the National Education Policy Center (NEPC) at the University of Colorado. Researchers there analyzed the work done at the Times’ behest and found that, while the VA model yielded different outcomes for different teachers, it did not tell them whether those outcomes measured what is important (teacher effectiveness) or something else, such as whether students benefited from other learning resources outside of school. One way to test the validity of the VA model is to investigate whether using it, a student’s future teacher would appear to have an effect on a student’s test performance in the past — something that is impossible in the world most of us inhabit. They found that future teachers did indeed affect the past learning of students, especially in reading, indicating the VA model is faulty.
Next the NEPC researchers devised an alternative VA model aimed at correcting the biases in the one the Times used. It controlled for a longer history of a student’s test performance, peer influence, and school-level factors. For reading outcomes, more than half the teachers had a different effectiveness rating under the alternative model. They also found that it’s highly likely there were a significant number of “false positives” and “false negatives” in which teachers who rated good or bad were in fact average.
Some have discounted the NEPC work because it was paid in part by the Great Lakes Center for Education Research and Practice whose membership includes affiliates of the National Education Association. Yet, this is exactly the kind of work the teachers’ unions ought to be supporting to bring more light to this debate. Moreover, this isn’t the first analysis questioning the value of VA. The Board on Testing and Assessment of the National Research Council has cautioned that there is too little evidence at this point to support the validity of these methodologies. Jesse Rothstein, University of California (Berkeley), has written a number of papers demonstrating the limitations of VA measures. He has found that teacher ratings vary widely from one test to another (the same teacher might be at the 80th percentile on one test and the 30th percentile on another) and that because of the ways children are assigned to classrooms even the best feasible VA estimates may be substantially biased.
The merits of VA will be argued for a long time to come. With time we may be able to greatly improve it. Meanwhile, in the cold calculus of research, some teachers may end up as false negatives, but in communities across the country, they will simply be people who did their jobs and got towed under by the unpredictable tides of a school reform that was not ready for prime time. Little wonder, then, that many who are good at what they do and answered teaching’s call out of the best of motives are having second thoughts about their line of work. That will make schools worse rather than better. School reformers would do well to borrow from medicine the principle: “First do no harm.”