Here at MobileWorks, the next-generation replacement for Amazon's Mechanical Turk, we've processed nearly half a million crowdsourcing and human computation tasks since launching in mid-2011.
We've built a platform that's powerful and versatile, but we're still constantly surprised by the ingenious uses of human brainpower that our customers come up with.
So, what did people crowdsource through MobileWorks in 2011? Here's a breakdown of the 7 most popular and most innovative tasks we saw:
#1: Human-powered search and information collection
Organizing the world's information constituted nearly 30% of our brainpower usage this past year. While modern search engines do a great job at gathering the individual pages you want, they can't yet deliver the exact information you need.
What do you do if you want to find the homepages of 1000 people working as software engineers and willing to move across the country to take a new job? How about if you want to generate an up-to-the-minute database of the most popular products sold on Craigslist in San Francisco? What if you need to generate a verified list of leads for a sales team, like 100,000 restaurant phone numbers in New York City?
In these situations, Google may be able to deliver individual pages for you, but it isn't able to extract and collect exactly what you're looking for. A trained crowd is still the best way to get it done.
We aren't surprised that this task is so popular, as the ability to quickly gather this sort of information is enormously valuable for recruiting, lead generation, and maintaining databases.
(If you're interested, MobileWorks has a dedicated application for web scraping: the Excavator.)
#2: Text extraction: OCR, handwriting recognition, form digitization & data entry
Digitization exploded in 2011, fueled in part by the medical industry's push to move paper records to digital ones.
Why use the crowd for a classic AI problem like OCR? Although plenty of software-based OCR solutions exist, they're often expensive and aren't great at understanding handwriting that's a little sloppy or unusual. They trip up on documents with mixed typefaces or changing orientations. Simply put, they can't handle corner cases. Crowds provide a completeness and robustness that traditional OCR doesn't.
On top of that, lab research shows us that crowd solutions can digitize documents that are simply beyond the capabilities of OCR software. In one experiment by our friends at MIT, a crowd deciphered this handwritten note:
image credit: Greg Little, Lydia B. Chilton, Robert C. Miller, and Max Goldman, HCOMP 2010
You misspelled several words. Please spellcheck your work next time. I also notice a few grammatical mistakes. Overall your writing style is a bit too phoney. You do make some good points, but they got lost amidst the writing. (signature)
Let's see OCR do that!
In 2012, we expect to see more and more companies with piles of paper choosing to send their work to the MobileWorks crowd to be digitized, rather than investing in developers or OCR software solutions.
#3: Tagging and sorting
As the web produces more user-generated content is produced -- photos, podcasts, and more -- more of it needs to be tagged by humans. Don't just think of your pictures on Flickr and Facebook: MobileWorks received requests to tag user-generated content in games, relevant segments in videos, and customer phone calls for automatic follow-up.
#4: Content filtering
Was that post naughty or nice? Machines do a good job keeping pill ads out of your email, but for almost all other kinds of content – marketplace listings, advertisements, and more – humans beat machines every time in checking whether content is relevant, interesting, and on-topic. More companies are finding that it's simpler to use a one-line crowd solution than try to fight a technological battle against spam.
#5: Advertising analysis
The popularity of ad-distribution networks sometimes means that advertisers don't know where their ads end up on the web. It's critical to understand the content of an ad and an article to avoid slip-ups like this:
Image credits: The Stupidity of Scripts, James Duncan, Flickr
We had two common advertising-related tasks: judging which company was behind a particular advertisement, and judging whether an ad was appropriate on a certain page. The cost of slip-ups are in ad placement are fairly high for advertisers and content producers alike, and it's very simple for humans to determine whether a particular combination is appropriate or inappropriate.
#6 Image recognition and description
We saw a good number of tasks that involved recognizing objects in images, classifying objects in pictures, or tagging images for a machine learning system to analyze later. These tasks came from a variety of sources, including a number of mobile phone applications letting users ask questions about documents or objects in the real world seen through their cameras.
This is another application where lab research paved the way: Rochester's VizWiz project allowed visually-impared people to upload pictures from their mobile phones and get descriptions of what they photographed back in seconds.
#7 Audio transcription and analysis:
Whenever a podcast is produced or an interview goes up on a tube site, advertisers want to be able to place relevant ads in it. And until speech recognition can deal with complex words, music, and accents, it's still better to use qualified humans.
And, without further ado, 2011's Task Of The Year...
Human powered website testing took the role as the most interesting task we saw in 2011.
It's sometimes difficult when maintaining large projects to write test cases that simulate extended human interaction with all parts of a website.
But it's dead simple to ask a human on MobileWorks to carry out a list of tasks on your site and make sure everything's working OK. It's cheap that many developers wanted to include a call to MobileWorks in their "automated" test routines, as a final deployment step.
What do you want to see crowdsourced in 2012?
Crowdsourcing is growing by leaps and bounds, and we can't wait to see what the next year will bring.