Showing posts with label grading. Show all posts
Showing posts with label grading. Show all posts

Monday, October 20, 2008

But the Answer was Right!

The cry of the Lucky Equation Grabber.

Not a relevant issue when the solution was wrong.

I grade the solution, not the answer.

This is a physics class, not a lucky guess class.

You can imagine the wailing and gnashing of teeth, but (as I wrote in the comments in a related thread over at Becky Hirta's) if you wanted to take the class from someone who doesn't care if you learn the physics, you went to the wrong place.

You need to be over at Wannabe Flagship where they give multiple choice tests and let kids use a crib sheet for F=ma and the other 100 equations you can derive from it, one for each possible problem on the exam.

They also don't notice that two algebra errors resulting in a correct answer equals two errors, not zero errors. I do.


Read Entire Article......

Sunday, June 29, 2008

Lab Writing

This is effectively the second part of an article on the objectives of lab classes, where I had limited the discussion to everything except lab reports. I also discussed some of this in the past, stimulated by two articles decrying empty thesis statements and overly effusive language (using verbiage to replace thought) in papers for a upper division or grad-level classes in ed (history of education) and english (Victorian literature). Their complaints are familiar to us in the sciences; I regularly learn new things from folks on that other side of the blog campus and sometimes on my own. [One interesting side result of what I have picked up in academic blogs is the realization that I should talk to people who teach composition or in the social sciences or humanities about these sorts of teaching issues.]

As I see it, the written parts of lab reports address two learning objectives: technical writing, where they have to learn to use numerical results (with uncertainties and units) in complete sentences and clearly define quantities in english rather than with symbols, and critical thinking, where they have to learn to draw conclusions based on quantitative results that a vague fuzziness due to experimental uncertainties as well as separate important from unimportant.

Let's look at these two issues, being careful to include ease of effective grading (assessment is the new magic word) in the design of the task we set our students ... and our TAs(who likely don't have English as a first language).

But first, one word on philosophy: The majority of my students are going on to become engineers, so I am looking more toward that kind of corporate environment (with its memos and reports) than academic research. I hope I can get The Thomas to provide his critique of these thoughts from where he sits in Corporate America, either in a long comment here or in his own blog.

Technical Writing.

I think of this as the requirement that they produce work that looks professional (using correct symbols and notation, such as SI prefixes or superscripts for powers of ten, and grouping the value and its uncertainty inside parentheses) and that communicates their answer without ambiguity. This means using technical terms instead of a catch phrase, and not using a symbol (is it V or v, is it velocity or voltage or volume?) that has not been defined by the writer or in the question.

This requirement is unambiguous and applies everywhere, but I don't mark it off every time I see it made. [I do know profs who take off 0.1 point for every instance of even a minor error, but I don't have that much time to devote to grading.] Instead, I focus attention on specific sections of the report for specific things, such as making sure values are reported correctly (with units) in the calculation section and in the conclusions sections. Those instances are highlighted on my version of the report. [Because of the particular, rather efficient, system I use for grading, this might be one of only two things I am looking for on a particular page of the report.] Nonetheless, I always keep my eye open for an oversight somewhere else in an otherwise good paper. I want everyone to be paranoid about errors of omission such as units or sig figs.

There is also an expectation that answers to questions and some lab exam problems be stated as complete sentences.


Critical Thinking.

This is the real challenge. It takes time and energy to do this right, so I often look at this part of the lab report first or devote a separate day to it, so I can do it while I am fresh. It also requires that the task for the student be well defined, so I try to focus it into just a few areas of the report.

One of these areas is actually easy to grade: the post-lab questions. These are actually a good place to address "conclusion" questions about the implications of the lab results, particularly the ones that require the use of uncertainties to address the significance of an "error" between what is expected and what is found. The key requirement is that they get the right answer and given an appropriate explanation (such as using the size of the standard deviation to judge the precision of their result). A specific question ensures that the student has to address the topic and also means I don't have to go hunting for it somewhere in two pages of conclusions.

I also require that they address specific issues in specific places in the conclusions part of the report. One section has to identify and summarize the most important results of the experiment. It has to be a single paragraph of modest length, like the abstract for a research paper or a memo in the real world. They are marked off for not putting the right things in there or for technical writing errors noted above. I sometimes require that this be a cover memo, while other times I have them call it an abstract. (That helps catch plagiarists.)

The other section must address a particular aspect of a typical "conclusion", such as what followup experiment could be done (and describing it in detail), identifying a real-world situation where this effect is important, explaining what might have caused inaccuracies in their results and how to avoid them in the future, or identifying specific procedures they used that might have contributed to the precision of their results. I vary this question from semester to semester to make it harder on lazy kids who try to use a file from a student in last year's lab.

The real trick is keeping track of students who say the same thing (e.g. didn't read the manual before class to be sure they knew what they were doing) rather than learning from their mistakes and improving their skills in the lab! If they are thinking, these answers should be more than a throwaway line.


Grading Rubrics.

These are essential, and it is essential that they be designed so students who do the experiment and complete all of the calculations with reasonable accuracy and attempt to answer all of the post-lab questions will get a minimum grade of C. Not only is that passing for our college, but it is all that the nearby engineering colleges need to see. That limits somewhat the deductions for egregious errors, but still leaves lots of room to encourage improvement.

It also has to work for me and the adjuncts who work under my supervision. Thirty or so lab reports a week can be a big load for me, particularly when they come due on a week when I also have to grade 250 pages (or more) of exam solutions. It might be an even bigger load for my adjuncts, who have other jobs as well. This requires focus on specific spot checks and clearly defined tasks, as noted above.

I long ago quit cutting them much slack on the initial reports. A warning without a deduction of points has no effect on future behavior. However, I do cut the penalty points for "critical thinking" types of errors to about half of the norm. We go over tech writing skills from day 1, so they are expected to do that part correctly. We also drop the lowest report, so that encourages improvement (but won't help if they skip one of the labs).

Appropos a point Matt made in his blog recently (see below for the link), the max deduction for omission of the separate "conclusion" writing assignments is 20%, although the deductions for flawed contributions are usually around 10% of the total grade. However, other items that require critical thinking make up at least another 20%, if not 30%, of the total. I also include questions of this type (interpret a certain result or write a summary of certain results) on the lab exams.


Other voices.

Matt's recent article about improving lab report conclusions, from the perspective of a graduate student at an R1 university, offers an interesting suggestion: collect some good conclusion sections from physics research papers. (I suppose I could use some of my own!) I'd also like to have a similar collection from industry, since those examples are generally unknown within the physics research or teaching community. As noted below, I would put more emphasis on showing them a good abstract rather than some good conclusions.

Chad Orzel provided his thoughts on the writing style used in lab reports last year, from the perspective of selective liberal arts college with a large physics program. I agree 100% on the evils of the passive voice, but I know where it comes from: our chemistry department. [Comment #15 makes that same observation.] They insist on it. In contrast, I insist on simple declarative sentences that state (in the correct past tense) that a specific quantity was measured, giving a specific result. [Comment #10 gives a nice example of good an bad ways of saying the same thing.] Of equal interest is how a number of comments came from composition teachers who regularly fight the same strange view of what makes good academic writing in their classes. Maybe the problems start in high school!

I also like Chad's emphasis on framing. In effect, that is what I do by trying making them start out by stating the most important result, whether it is a measured value or the verification that energy was conserved to within 10%. A good abstract tells you the important result(s) in a "Just the facts, ma'am" style, much like the headline / sub-headline sequence in the NYTimes.


Read Entire Article......

Saturday, June 14, 2008

Predictable Grades

Profgrrrl blogged about being able to predict where her summer (grad) students would fall on the grad distribution for a test about 2/3 of the way through a short semester. In the comments, Belle (who had a wonderful recent article about nonsense in some AP history? exams on the grading table) asked how that might be done.

I think Profgrrrrl's observation was about the sort of informal approach many of us do, where we have learned to expect some continuity in both "A" and "D" performance throughout the semester, but I've gone a bit further as a teaching tool.

In one gen-ed class, I make a point of recording the quasi-midterm percentage grade I report to them after one of the exams. I put it in a place in my gradebook where I can easily compare it to the final grade, pretty much at a glance. It is useful to know that most students who are borderline (particularly on the pass line) improve their grade on the final exam - mostly because the first exam is harder than they expect, but they can get those kinds of questions right on the final if they change how they study for the class. I can use that to encourage them to keep improving.

In my first semester physics class, I have been doing a retrospective study for a number of years, building a histogram of sorts where I put the final grade in the class next to the numerical score on the first hour exam. A pattern emerges where most bad scores fail and most good scores do well, but there are always a few cases where a 100 gets an F or a 40 gets a B. (The latter is possible because I follow one of many standard grading schemes where the worst exam is either dropped or replaced by the score on the final exam. The former is possible because some students don't keep doing what got them the good grade on the first test.) The extremes tend to track pretty well.

The interesting result is that the most critical group of students are the ones in the C-D range. For them, what matters the most is what they do about their performance on the first test. Do they sit down and work it all out correctly the very next night? Do they figure out the relationship between exam questions and key topics covered in the homework and adjust their study methods accordingly? Do they realize that copying an example in the book or getting someone else to tell them what formula to use to get the homework correct is not quite the same as "doing" the homework with the goal of "learning" how to do problems on a test? If they do, their grades generally go up into the B-C range, sometimes even into the A range. If they don't, their grades generally go down into the D-F range. A person with a 70 on that first exam is almost literally sitting on the Continental Divide of performance, sliding down the razor blade of life.

Again, conveying this sort of information can make the act of returning the first exam into a teachable moment. Not that all of them pay attention or choose to put in the effort needed to learn, but enough clearly do that I find it to be a useful effort on my part.


Read Entire Article......

Friday, May 30, 2008

Exam Design and Grading

Matt posted a partial solution to a basic conservation of energy problem that is part of a set of bogus solutions that has made the rounds over the years. My comment focused on the digression into whether the sample solution really deserved a zero, and some other things about grading an exam, but Matt's reply to another comment made me realize something more was involved than just how to assign partial credit.

Rather than run on at length there, I figured I should blog it here. I should also back-link to an earlier article of mine on assembly-line grading, since a similar approach was advocated by another comment in that thread. (A comment from someone who blogs about Mandelstam variables! Gotta add that in my blogroll right now. IMHO, "s" rules.)

The observation that made me realize more needs to be said was the following:

For this particular problem, either from memory or from a (probably large) formula sheet the student did manage to pick the two right equations and only those two. ... It is likely that my 5/10 is too generous, but either way it’s probably still a D at best.

I would be concerned that 5/15 for this problem, although an F for this particular problem in isolation, might accumulate into a score that does not reflect the physics knowledge of this student. (Of course, it is highly probable that even liberal partial credit will result in a score of 35 or 40% for the entire test for this student.)

What they brought with them:

I don't allow formula sheets or the functional equivalent (a calculator with formula sheets stored in it). If I did, there would be zero partial credit for this problem because the student did not have to pull U = mgh out of his or her head and, more importantly, because the student did not pull the ONE right equation (Ef = Ei + W) off of that sheet. Without one, I might give one point for knowing to calculate the potential energy but not knowing what units it has.

Exam Design:

Problem grading also has to be considered in the context of how you write the exam in the first place. You can write an exam consisting of fairly easy problems that are either right or wrong (this is often done in large classes with m.c. exams) and get a reasonable grade distribution if the students don't study very well or don't know how to use their calculator properly. But if the students are prepared, such an exam will not have a normal distribution. [My first exam in Physics 1 is such an exam, and the predictable result can be seen in a blog article linked below.] Worse, those kinds of tests don't require any critical thinking.

This particular problem is an extremely elementary problem. If (actually, when) I put it on an exam, it is part of the C-levels part of the test. I use a straight scale with no tolerance on the 70 line for a C, so the exams are matched to that objective while taking my historic partial credit policies into account. This is a problem where I would expect a C student should get all 15 points, losing at most three points for leaving off units or making a computational error that would have to be made up elsewhere on the test (by getting somewhere on an A or B problem) to pass the test. I'd probably also take off 3 points for leaving off the Ef = Ei + W (or just Ef = Ei) starting point on part a. Someone who does not know they need to conserve energy deserves to lose all of those 15 points.

Note added for clarity:
The basic idea behind this exam design is to ensure that a student who deserves to pass can get about 70 on the test. That might mean 60 points of "C" problems, with most of the rest being "B+" problems. A C student is expected to get some partial credit from the B problems, and a B student is expected to miss some points on the B problems. I usually include one "A" problem, with significant critical thinking required, but limit it to 5% of the exam totals on average. This means that a perfectly prepared student can get an A without being able to do the sort of challenging problems they would see at Harvey Mudd, but that has proven more than good enough for the Wannabe Flagship U they mostly attend.


Partial credit is for harder problems, where the student might figure out what kind of problem it is but make a major error in setting it up and get nowhere as a result. Problems with a significant critical thinking component coupled to a complicated calculation. However, my own design decisions tend to decouple a problem with a complicated set up (say a vector integral for the electric field from a charge distribution) from its actual solution. The whole thing might be in the homework, but an exam question will stop with setup of the integral.

Grade distributions on tests:

Most of my exams are bi-modal. A non-representative sample for a pair of tests was shown in an article about the relationship between doing homework and doing well on exams that I wrote some months ago.

Concerning Matt's critique of the no-partial credit philosophy "No partial credit! You engineer, you make mistake, bridge fall down, people die!!" posted in the comments, which was
"I understand his reasoning, but also a working engineer is also able to consult references, double check math with computers, and discuss with a team.", I have three observations:

A working engineer has to pass a licensing exam where this particular problem would need to be answered correctly in about 30 seconds, to allow more than two minutes to solve a symmetric step ladder problem. That is not enough time to look up a formula for a trivial physics relationship.

The computer program can be wrong. GIGO can take several forms, including bad data tables hidden in the code.

If you rely on the team to catch your mistakes, who will catch theirs? The most recent example is the I-35 bridge collapse in Minnesota, which was the ultimate consequence of an elementary error made when the bridge was designed 40 years before. It is not yet known (and may never be known) where that error was made, but even if it was made by a draftsman there were multiple places where it should have been caught in a properly run professional organization.

I give partial credit, but that is coupled to an exam design where that credit is primarily for minor computational or algebra errors and should not allow someone to pass who has too much elementary physics left to be taught by their engineering professors. They have enough to do to teach the next level of physics and calculus encountered in actual engineering problems.

Link back:
Noticed this interesting discussion, Grading Policy, Sir! about teaching at the Navy nuclear power school in Orlando. I really like the use of a special grading shorthand described there!


Read Entire Article......

Thursday, December 20, 2007

Efficient grading (physics and math)

This will have to be the short version, since I have to do some real work tonight.

My experience is with quantitative subjects like physics and math, but since it seems to have some value for stuff that requires reading (like lab reports), it can probably be adapted for other subject areas. Details below the fold.

The starting point is that you only grade one problem (or even sub-problem or portion of a report) at a time. That was a given when grading 750 final exams in one day, because each person does one problem while piles of papers get pushed around a big conference room table, but it is also how I grade exams solo.

Here I will assume you are grading the entire exam of 40 or 50 papers yourself, maybe 400 distinct problem solutions. There are some differences when you are only going to grade one problem on 750 exams, so I will summarize those at the end.

My exams all have a cover page for the total, version, etc, so the first step is to turn the page to the first problem to be graded and invert the stack so the first papers turned in are at the top. If not, just set it up for the first problem, which might not be problem 1 on the test. If there are multiple versions, set them up so you are grading the "same" problem on all versions. This is key.

I always start grading with an 'easy' problem. I never start with one that might have N-8 distinct wrong answers. Too depressing, and it defeats the key step.

Sort the exams by answer. That is, make one pass through the tests looking only at the bottom line, the answer and its units. Completely correct answers go in one pile, ones with minor numerical variations or missing units go in another, and ones with wrong answers go in a third. Keep exams with the same wrong answer together.

Reassemble into one big stack and work through them starting at the top, or in the middle if your rubric has a key intermediate result or formula identified for that problem. Just because the answer is right does not mean the solution is right! This goes quickly for 'simple' problems, but requires a bit of care for ones where there are ways for two wrongs to make it right. It does take some experience to know which problems are likely to have 'magic' algebra steps where negative signs mysteriously change as needed, for example.

[Side remark: It is crucial that the solution be checked along with the answer. Many of the algebra weaknesses I see could only have made it into physics and calculus because an algebra or trig teacher did not check that they took the square root of a negative number and just made it into the correct real answer. I don't want that kid designing a bridge I have to drive over!]

There are two advantages here. One, the exams that deserve the same partial credit are next to each other. I may make a note on my key to say how certain errors are dealt with just for future reference, but I rarely have to consult them or look back to see how some weird thing got graded. Two, the exams are usually in the optimal order when it comes time to grade the next problem.

Next problem, same process. However, now the odds are that you have correct answers already at the top. The fact that you see a lot of correct exams before getting to the dregs is great for grader morale, which improves my efficiency. You do have to be super careful not to overlook errors on good papers or be too tough on the bottom. When a "top" paper has an error, it goes to the very bottom and when a "bottom" paper has the right answer, it goes to the very top. Sometimes I cut the deck, as it were, just to be sure I am fair (particularly for a long exam like a final).

The key is that you do not look through the entire solution for every paper, just the ones where you need to figure out what they did. Even then, you often get the efficiency of knowing that four papers all made the same kind of error in step 3 or 4 of the solution. You only have to figure it out once. After that, your eyes go directly to the error, flag it, and move on.

Handling each exam twice is more than made up for by less time spent on each one. I am also less likely to overlook an error in units or significant figures if I check that detail in my first pass. That means I also save time by not having to go back and look at an exam that was already graded.

Changes for Really Big Classes:

We would pre-grade a few dozen exams, picking out specific students from our own sections if that was practical. That would set up the rubric and give us some idea of what we had to watch for, but we also kept a sheet of paper with wrong answers listed along with the partial credit it got (and the reason why) when new ones showed up. This makes up for not being able to sort the exams. The basic idea was still the same: work from bottom to middle to top (or bottom to top to middle) of the solution to check the answer, the algebraic process used, and the physics.


Read Entire Article......

Grading

Profgrrrl has a thread about how people compute their grades. I commented there, but thought I would elaborate here.

Like her, I use a spreadsheet ... but I use Quattro Pro to do the grade computation rather than M$ XL. I also use an old fashioned grade book, which stays in my briefcase. The grade book is only for transient records (I don't carry a computer around with me), answering student questions, flagging something that will need correction (whether late assignments or a grading error) and a bit of permanence should something crash.


There was a gap where I only did research, so I can't tie my transition from paper to a spreadsheet to any specific change in the available tools. My history doing grades is quite a long one:

  • Undergrad TA: All grades were calculated by hand, since calculators did not exist at the time and slide rules can't do addition. Exams and quiz scores were added up by hand, as were final averages. Some habits carry over from those times. I still keep a grade book in that style, and I still transfer the sub-total for each page to the front of the exam for final computing. That was when I was teaching math, and lots of math faculty still do it this way.
  • Grad Student: I had a calculator, so most serious arithmetic was done that way. Exam scores were usually added up in my head, because I could do it faster than a calculator. Never bothered to write a grading program in Fortran because they already had one (or was it in COBOL?) for the really big lecture classes that did not have any homework grade.
  • Side comment: That was where I was taught how to grade exams really efficiently. It was only recently that I learned this had a name: rubric. But it is more than that, it is an assembly-line-like process. I may blog about it if asked. Once you can grade one problem on 750 exams, accurately and efficiently, you can grade anything.
  • Research Faculty: I taught some classes while on a full-time appointment that allowed that in addition to doing research. That was where I developed a grading spreadsheet in Quattro Pro that I still use today. It had three sheets (summary, homework, exams) just like mine does today.
  • CC Faculty: Nothing much changed, so nothing much changed. Once you have a PC, there is not much else to do but put it on your phone. But the security issues if that phone got lost? Priceless.
My comment on Profgrrrrl's blog about not having to remember how to do formulas comes from this experience. I have not built a totally new spreadsheet since the first one, maybe 15 years ago. I've only had to make minor adjustments as my grading formula changes.

So my grading process has two stages. I first enter grades into the grade book, even if I am home where the computer version lives. They go in the grade book with the same pen I have in my hand, and are easily entered in random order. (I alphabetize exams before handing them back, but not labs or homework, so that is an important detail. I can see an entire class in my grade book, but can't see the entire list on a computer screen.) I then copy them into the computer, now alphabetically, and cross check if it is important, like an exam.

At the end of the year, I "print" the final spreadsheet info to pdf and keep it on two different computers (home and work) and in two separate hard-copy files (one with the final exams at work, the other is the actual gradebook).

I can't imagine not using a spreadsheet. Besides the advantage of getting the numeric grades as soon as the final exam numbers go into the computer, it is also easy to generate midterm and other grades for the students. I don't use Bb for this because I use a different course management system and our college does not automate grade transfer from Bb to our on-line grade entry system. I'd have to transfer too much info from one place to another to make that work.

A big part of the grade comes from HW managed through a non-proprietary course management system. I print its gradebook for each block of HW when I give an exam, and use that for two things. One, it is a hard record of those grades that get transferred into my grade book and computer, and two, it is a way to keep records on possible correlations between specific exam questions and HW performance on related problems. Are they learning? I've learned that it matters more whether they tried the problem seriously than if they got it right. Kids who get it right by using their book, notes, or (more likely) advice from friends might not do as well as kids who got it wrong but figured out what they were doing wrong after they saw the answer. In fact, I have evidence that this is exactly what happens.


Read Entire Article......