Tuesday, September 26, 2006

Database Designed to Thwart Plagiarists

When McLean High School students write this year about Othello or immigration policy, their teachers won't be the only ones examining the papers. So will a California company that specializes in catching cheaters.

The for-profit service known as Turnitin checks student work against a database of more than 22 million papers written by students around the world, as well as online sources and electronic archives of journals. School administrators said the service, which they will start using next week, is meant to deter plagiarism at a time when the Internet makes it easy to copy someone else's words.

The service has grown dramatically, Barrie said, and is now used by more than 6,000 academic institutions in 90 countries. Barrie, who is president and chief executive of iParadigms, said 60,000 student assignments are added to the database daily.
A friend of mine teaching at a university told me how many of her students were using second hand reports they bought off of the internet and how it was a pain to have to check for this. My instant thought was that the solution to any problem caused by technology is always more technology. :) If the students can use the internet to get papers, teachers should be able to use the internet to verify that the paper is really unique.

Turnitin has an interesting way to solve the problem, but not the one I would have gone with. This technique requires every paper ever written to be in the database. That seems unlikely. The database will continue to grow larger each year making searches slower. To beat the system, you just need to verify that the paper you are copying is not in the system. Seems pretty easy for any online paper seller to surepticiously get an account and verify that a paper is "clean". You could also beat the system by outsourcing your work to an Indian to be written from scratch.

The way I would have gone about this is to take a look at the words and sentence structure of the student's previous work to create a unique "signature" and compare it with the new paper. Each person has a unique vocabulary, word frequency and style of writing. You could write a program that compares the current paper to the student's previous work and give you a probability that the student really wrote the paper. This would not have the problems of the database system that I listed above.

Another advantage is that by taking a look at the makeup of the paper, the software could analyze how good the writer is and give suggestions on how to improve. It could act as a thesaurus to improve the vocabulary of the writing as well as give grammatical and other structural hints on how to make the writing better.

I have always dreamed of software that can take my writing and instantly improve it by changing the sentence structure, substituting words and fixing all spelling and grammatical problems. Such software would break the signature check that I was talking about, and really raise the question of who really wrote the paper. But looking at the state of the art of the Clippy "help" in Microsoft Word, this day is probably still a ways off.

via The Washington Post


crush41 said...

A philosophy TA I know ran his own paper for a course he was taking through the system just to see how it would work (so he'd be familiar with it for his own students). Apparently it not only scours the internet but also scours the text of other papers that have previously been submitted.

Consequently, the TA's professor was alarmed when the paper he'd written came back as "90% plagiarized"!

mping said...

That is interesting, but really begs the question of why only 90%?

Post a Comment

Note: Only a member of this blog may post a comment.