Democratic Underground Latest Greatest Lobby Journals Search Options Help Login
Google

I've been helping to decode old books.. and didn't know it.. You probably are too..

Printer-friendly format Printer-friendly format
Printer-friendly format Email this thread to a friend
Printer-friendly format Bookmark this thread
This topic is archived.
Home » Discuss » General Discussion Donate to DU
 
Fumesucker Donating Member (1000+ posts) Send PM | Profile | Ignore Sat Sep-24-11 05:47 AM
Original message
I've been helping to decode old books.. and didn't know it.. You probably are too..
In a nutshell, some of those blurred out words you have to enter to post on some websites are actually scans from old books that are digitized but the computer can't decipher, humans are still better at reading funky text than computers are evidently.

http://news.bbc.co.uk/2/hi/technology/7023627.stm

A weapon used to fight spammers is now helping university researchers preserve old books and manuscripts. Many websites use an automated test to tell computers and humans apart when signing up to an account or logging in.

The test consists of typing in a few random letters in an image and is designed to fight spammers.

Carnegie Mellon is using this test to help decipher words in books that machines cannot read by letting sites use them to authenticate log-ins.

The test, known as a CAPTCHA (Completely Automated Turing Test To Tell Computers and Humans Apart), was originally designed at Carnegie Mellon to help to keep out automated programs known as "bots."

Read the rest of the article at the link.




Printer Friendly | Permalink |  | Top
bananas Donating Member (1000+ posts) Send PM | Profile | Ignore Sat Sep-24-11 05:53 AM
Response to Original message
1. Whoa!
What a great idea!
Thanks for letting us know!
Printer Friendly | Permalink |  | Top
 
Frank Cannon Donating Member (1000+ posts) Send PM | Profile | Ignore Sat Sep-24-11 05:58 AM
Response to Original message
2. Wait a minute. If the computer doesn't know what the word is...
then how can someone's input into the CAPTCHA be verified? :shrug:
Printer Friendly | Permalink |  | Top
 
Fumesucker Donating Member (1000+ posts) Send PM | Profile | Ignore Sat Sep-24-11 06:01 AM
Response to Reply #2
4. They show the same word to more than one person..
If two random people get the same answer then it's almost certainly correct..

If the two people do not agree they then go on and show it to a lot more people, I assume they then use a statistical process..
Printer Friendly | Permalink |  | Top
 
joshcryer Donating Member (1000+ posts) Send PM | Profile | Ignore Sat Sep-24-11 06:04 AM
Response to Reply #4
7. Here's the paper (it has grone exponentially from when it started):
Printer Friendly | Permalink |  | Top
 
Leopolds Ghost Donating Member (1000+ posts) Send PM | Profile | Ignore Sun Sep-25-11 11:36 AM
Response to Reply #4
17. FASCINATING! In theory, this could have profound implications --
The wisdom of crowds could be used in various areas of cryptography and it sounds like this principle could be extended to cover all CAPTCHA's by using crowd wisdom to determine the correct answer.
Printer Friendly | Permalink |  | Top
 
joshcryer Donating Member (1000+ posts) Send PM | Profile | Ignore Sat Sep-24-11 06:01 AM
Response to Reply #2
5. It's tested against other users' input, not the computer.
Printer Friendly | Permalink |  | Top
 
begin_within Donating Member (1000+ posts) Send PM | Profile | Ignore Sat Sep-24-11 06:54 AM
Response to Reply #2
9. I've deliberately misspelled the words a few times and still been let through!
Printer Friendly | Permalink |  | Top
 
Frank Cannon Donating Member (1000+ posts) Send PM | Profile | Ignore Sun Sep-25-11 05:33 AM
Response to Reply #9
15. That's very interesting. I never tried that.
From now on, I'm only putting in obscenities.
Printer Friendly | Permalink |  | Top
 
joshcryer Donating Member (1000+ posts) Send PM | Profile | Ignore Sat Sep-24-11 06:00 AM
Response to Original message
3. This is from 2007. reCAPTCHA is now owned by Google, and does 200+ million CAPTCHA's a day.
http://en.wikipedia.org/wiki/ReCAPTCHA

In due course all data will be digitized. I'm kinda glad it's Google that's doing it as opposed to other companies or groups.
Printer Friendly | Permalink |  | Top
 
Fumesucker Donating Member (1000+ posts) Send PM | Profile | Ignore Sat Sep-24-11 06:02 AM
Response to Reply #3
6. I wasn't aware of it, that was the first article that popped up when I Googled..
Printer Friendly | Permalink |  | Top
 
joshcryer Donating Member (1000+ posts) Send PM | Profile | Ignore Sat Sep-24-11 06:05 AM
Response to Reply #6
8. No worries, it's cool man, it's a pretty remarkable service.
Google uses similar methods for their translation services (though they have a larger pool of data to go by they basically compare translated texts to one another).
Printer Friendly | Permalink |  | Top
 
Leopolds Ghost Donating Member (1000+ posts) Send PM | Profile | Ignore Sun Sep-25-11 11:38 AM
Response to Reply #3
19. Don't be glad. Google is a monopoly and is becoming more of one with their browser etc.
They also own YouTube now.
Printer Friendly | Permalink |  | Top
 
eilen Donating Member (1000+ posts) Send PM | Profile | Ignore Sat Sep-24-11 07:01 AM
Response to Original message
10. They should just hire out of work nurses
we have been decoding physician handwriting for years. We are highly skilled.
Printer Friendly | Permalink |  | Top
 
begin_within Donating Member (1000+ posts) Send PM | Profile | Ignore Sun Sep-25-11 10:51 AM
Response to Reply #10
16. Hats off to you
My Mom's doctor seems incapable of writing a simple prescription without at least one error that causes the pharmacy to reject it.
Printer Friendly | Permalink |  | Top
 
hunter Donating Member (1000+ posts) Send PM | Profile | Ignore Sun Sep-25-11 11:37 AM
Response to Reply #10
18. !
:rofl:
Printer Friendly | Permalink |  | Top
 
Celebration Donating Member (1000+ posts) Send PM | Profile | Ignore Sat Sep-24-11 07:36 AM
Response to Original message
11. must be why some of those are so hard to read
On some sites, those things are so hard to read that I have to try them several times. On other sites, they are really easy. Or maybe it depends on which book we are decoding????
Printer Friendly | Permalink |  | Top
 
JVS Donating Member (1000+ posts) Send PM | Profile | Ignore Sat Sep-24-11 08:21 AM
Response to Reply #11
14. Sometimes they are gibberish.
Edited on Sat Sep-24-11 08:22 AM by JVS
There are a couple ways of doing things. Some have one word that needs to be decoded, and some have two words that need to be decoded.

In the two word ones, there is always the key word (they know it and reproducing it will get you in) and the unknown word. They figure that if you put in the key word correctly, you'll put in the unknown word. What I find interesting is that sometimes the unknown isn't even from a standard english font. For example, I've seen plenty of words written in the old German script. Since I know that, I go ahead and enter them in correctly. Other times I have seen parts of mathematical fomulae involving superscripts and subscripts or even worse Hebrew letters (even if I knew them, what am I to do, transliterate into the Latin alphabet?) about which I have no clue. I just type random letters and it lets me through as long as I have the key word correct.

In order to make it uncertain which is they key word and which is the word they're trying to find out, they do distort the image of the key word a bit.
Printer Friendly | Permalink |  | Top
 
HysteryDiagnosis Donating Member (1000+ posts) Send PM | Profile | Ignore Sat Sep-24-11 08:14 AM
Response to Original message
12. They could scan "nuclear" and gw would never ever be able to type it out correctly. n/t
Printer Friendly | Permalink |  | Top
 
Fumesucker Donating Member (1000+ posts) Send PM | Profile | Ignore Sat Sep-24-11 08:16 AM
Response to Reply #12
13. Eh, if you gave a million monkeys a million keyboards..
They would all sound more intelligent than Dubya.

Printer Friendly | Permalink |  | Top
 
Leopolds Ghost Donating Member (1000+ posts) Send PM | Profile | Ignore Sun Sep-25-11 11:38 AM
Response to Reply #13
20. One *chimp is enough. n/t
Edited on Sun Sep-25-11 11:39 AM by Leopolds Ghost
Printer Friendly | Permalink |  | Top
 
FredStembottom Donating Member (1000+ posts) Send PM | Profile | Ignore Sun Sep-25-11 11:51 AM
Response to Original message
21. I noticed about a year ago that I was getting real words.
not just random characters.

Words like:

exonerate
indubitably
fallacious

That was a change that made me wonder!
Printer Friendly | Permalink |  | Top
 
DU AdBot (1000+ posts) Click to send private message to this author Click to view 
this author's profile Click to add 
this author to your buddy list Click to add 
this author to your Ignore list Tue Apr 23rd 2024, 08:55 PM
Response to Original message
Advertisements [?]
 Top

Home » Discuss » General Discussion Donate to DU

Powered by DCForum+ Version 1.1 Copyright 1997-2002 DCScripts.com
Software has been extensively modified by the DU administrators


Important Notices: By participating on this discussion board, visitors agree to abide by the rules outlined on our Rules page. Messages posted on the Democratic Underground Discussion Forums are the opinions of the individuals who post them, and do not necessarily represent the opinions of Democratic Underground, LLC.

Home  |  Discussion Forums  |  Journals |  Store  |  Donate

About DU  |  Contact Us  |  Privacy Policy

Got a message for Democratic Underground? Click here to send us a message.

© 2001 - 2011 Democratic Underground, LLC