It’s the cycle of scientific dissemination – research leads to publications, which lead to intellectual property that can inevitably be plagiarized.
Every day, hundreds of papers are added to the massive public database of scientific research known as ArXiv. Due to the large amount of content and need to protect authors’ intellectual property, the database uses an algorithm to detect re-used text from already existing articles.
“The algorithm is such that it can compare over 500 new articles per day to roughly one million already in the database in a matter of seconds,” ArXiv founder Paul Ginsparg told The Atlantic.
When looking at the papers submitted in a one month time frame, about three percent – or 250 papers – were flagged for plagiarism. This rounds out to thousands of papers per year.