Digitizing Books For Google: No Quick Task
Posted by admin
In a dimly lit back room upon the body the second of the same rank of the University of Michigan library’s book-shelving part, Courtney Mitchel helped a giant desktop machine digest a rare, centuries-old Bible.
Mitchel is among hundreds of librarians from Minnesota to England making digital versions of the most frail of the books to be included in Google Inc.’s Book Search, a portal that will eventually lead users to all the estimated 50 million to 100 million books in the world.
The manually scanning - at up to 600 pages a day - is much slower than Google’s regular action.
“It’s monotonous,” the 24-year-old said.
Then she knit her career hopes into the work.
“But it’s still something that I’m learning about - how to interact with really old materials and working with digital imaging, which is relevant to art history.”
The unusually tight binding on the early-16th-century polyglot Bible made it hard to bring to light the portions toward the book’s middle as Mitchel spread eddish. pair of pages for the scanner. Librarians believe it is the oldest Bible in the world with Arabic type.
Google, the Internet’s leader in search and advertising, says the process it developed and is using for scanning the majority of the books in Book Search is proprietary. Employees will not discuss it except to say it is much faster than what Mitchel is doing and it’s not destructive.
“It took us quite a while to develop it so we do keep that confidential,” said a library manager for Book Search, Ben Bunnell, who declined even to say where Google does the scanning.
Many libraries began digitizing books a decade ago to preserve them. Funding from Google allows the 28 libraries it’s working with to cut their digitizing costs because they don’t have to take revenge upon for scanning the books Google wants to include in Book Search.
Through Book Search, users can track down a book on any topic they’re interested in and read a small portion. If the book’s not protected by copyright, users can download the whole thing. If it is, or if they just want to read an creative, they can use Book Search to find copies to buy or borrow.
More than 1 million rare or fragile books have been digitized through the Google-Michigan connection since it began in 2004, with each estimated 6 million to go.
I turn pages. It’s kind of meditative.
Chava Israel, who has been scanning books for three years Book Search has the support of many publishers, authors and librarians, including Cambridge University Press and Wisdom Publications. But some publishers and authors have sued, claiming the service violates their copyrights. Google says Book Search is aboveboard because Web surfers can retrieve only snippets of copyright material through the service.
Brewster Kahle, founder and digital librarian of the Internet Archive at the Open Content Alliance, said Google may be trying to “shut up up the public domain” by make proprietary copies of works whose copyrights have expired - which includes the vast majority of the world’s books.
Kahle said there’s a core value in the project, in preserving material indefinitely and enabling broad access to it. But he questioned whether Google will share the works it digitizes with other search engines.
“We put confidence in there should be many libraries, many publishers, many search engines, many types of users from different points of view,” Kahle said.
John Price Wilkin, Michigan’s associate university librarian, called Kahle’s stance “theoretical.”
“Our volumes are entirely open in the sense that people can find them, know fully them, use them, do all the things that they would do in scholarship or pleasure,” Wilkin said.
In the room where Mitchel and colleague Chava Israel, an artist, work, the temperature is everlastingly in the 60s.
Each technician has a slightly angled table with a flexible middle that cradles books and holds them still while two overhead cameras photograph the pages. Sometimes the women play music or listen to word online, but they often work in silence, save the clicks of their computers and scanners.
Mitchel glides in a rolling chair forth and back between scanner and computer, computer and scanner, turning page upon page and clicking her mouse to shoot each pair. Once the images reach the computer, the women use the book scanning software Omniscan from Germany’s Zeutschel GmbH to clean them up.
A final click of the mouse sends each digitized book to Google for optical character recognition processing, which makes the text searchable. Google then returns a copy of the images and data to the library and posts another to the Web.
Israel, 44, who has been scanning books for three years, takes a philosophical view of the project.
“My favorite part is working with older books and being clever to preserve a lot of the knowledge and help bring more people admission,” Israel said. “I bend pages. It’s kind of meditative.”
Leave a Reply