[NIFL-TECHNOLOGY:2659] Distributed Proofreading - help needed

From: Steve Linberg (steve@silicongoblin.com)
Date: Mon Nov 11 2002 - 23:11:39 EST


Return-Path: <nifl-technology@literacy.nifl.gov>
Received: from literacy (localhost [127.0.0.1]) by literacy.nifl.gov (8.10.2/8.10.2) with SMTP id gAC4BdX17199; Mon, 11 Nov 2002 23:11:39 -0500 (EST)
Date: Mon, 11 Nov 2002 23:11:39 -0500 (EST)
Message-Id: <Pine.LNX.4.21.0211112307390.9070-100000@shagrat.silicongoblin.com>
Errors-To: listowner@literacy.nifl.gov
Reply-To: nifl-technology@literacy.nifl.gov
Originator: nifl-technology@literacy.nifl.gov
Sender: nifl-technology@literacy.nifl.gov
Precedence: bulk
From: "Steve Linberg" <steve@silicongoblin.com>
To: Multiple recipients of list <nifl-technology@literacy.nifl.gov>
Subject: [NIFL-TECHNOLOGY:2659] Distributed Proofreading - help needed
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Type: TEXT/PLAIN; charset=US-ASCII
Status: O
Content-Length: 3561
Lines: 77

(Note: I sent this to the list last week, but it seems to have gotten
eaten by the list processor.  Trying again.)

-------------------------------------------------------------------

Fellow NIFL-Tech'ers,

Want to help save the world by enriching the public domain?

You may have heard of Project Gutenberg - one of the heroic jewels of the
internet.  It's been around for a long time, and its goal is to enrich the
public domain by scanning and transcribing books and texts that have
become public domain and making them available free, for download.  I used
them a lot when I was teaching.

It's an entirely volunteer effort to help preseve the public domain and
make public-domain books accessible to anyone who wants them.  I believe
copyright law currently protects written work for 70 years, which means
anything copyrighted in 1932 or earlier is in the public domain.  There
are efforts from the publishing industry to change this to "life plus 70",
meaning 70 years from the death of the author, not from the date of
publication.  If this succeeds, we'll see another huge rollback in the
public domain, where only books whose AUTHORS died in 1932 or earlier
would be eligible, but I digress.

It's a huge amount of work to transcribe a book.  I've done it.  You
either type it out by hand, or you scan the book's pages with a scanner
and then run them through an OCR program (Optical Character Recognition)
that tries to turn the IMAGE of the text into actual text, analyzing the
shape of the letters.  Depending on the quality of the original, this is
usually faster than typing, but still requires heavy proofing.  It's very
time-consuming and a lot of beginning efforts die on the vine because it's
so hard.

There's a new project up online that you can join to help the effort to
get scanned books proofed.  They ask "a page a day" of volunteers.  I just
joined and did my first page, it took about ten minutes.  If you feel, as
I do, that a rich public domain is an indispensible cornerstone of any
literacy effort, particularly in an era of ever-tightening budgets and
programs run on a shoestring, please consider kicking in a few minutes of
your time to help.  It seems like a really worthy cause, and they've made
it VERY easy to contribute.

The site is at this address:

http://texts01.archive.org/dp

This is how it works: you go there and create an account.  (You can also
read about the project and so forth).  With your account you log in
immediately, and then you are presented with a list of books in need of
proofing.  You can pick the one you want to work on (I chose one about
copyright authors, a statistical work that's dry but important for the
project), but there's also poetry, literature (someone's doing The Iliad,
among others), and more.  You click "project info and start proofing",
read the project manager's notes for that book, and then you're given a
scanned page to look at and a text box with the scanned text from the
page.  You go through and correct the errors as best you can, and then you
submit it.  That's it!  You can also review the pages you've done and go
over them again if you want.  It also keeps your statistics and you can
see how the overall project is doing.  Very cool.

Personally I really enjoyed the simplicity of being able to do one
discrete, tangible bit of good so easily in a time so generally rife with
bad news.

Cheers,

Steve

-- 
Steve Linberg, Chief Goblin 
Silicon Goblin Technologies 
http://silicongoblin.com 
Be kind.  Remember, everyone you meet is fighting a hard battle. 



This archive was generated by hypermail 2b30 : Fri Jan 17 2003 - 14:44:49 EST