Winnowing

Saul Schleimer(University of Illinois Chicago), Daniel Shawcross Wilkerson(University of California, Berkeley), Alex Aiken(University of California, Berkeley)
Unknown
June 9, 2003
Cited by 1,104

Abstract

Digital content is for copying: quotation, revision, plagiarism, and file sharing all create copies. Document fingerprinting is concerned with accurately identifying copying, including small partial copies, within large sets of documents.We introduce the class of local document fingerprinting algorithms, which seems to capture an essential property of any finger-printing technique guaranteed to detect copies. We prove a novel lower bound on the performance of any local algorithm. We also develop winnowing, an efficient local fingerprinting algorithm, and show that winnowing's performance is within 33% of the lower bound. Finally, we also give experimental results on Web data, and report experience with MOSS, a widely-used plagiarism detection service.


Related Papers

No related papers found

Powered by citation graph analysis