As published on Technorati
Human Touch Data
Years ago the federal government launched a project where they had millions of documents that needed to be assessed and then declassified. It was much too large a job for human curation, so a method needed to be designed and implemented that would assure one hundred percent accuracy when determining if in fact the document should be declassified. The gist of the design was to use Google type documentation database handling in conjunction with plagiarism matching algorithms that were weighted for classified data. The objective was to differentiate classified, and declassified content. In the end, it was determined that it was always necessary to involve a human element to ensure accuracy.
Years later, search engine provider Blekko is proving the same theory: necessary human element involvement ensures accuracy.
What Makes Blekko Different?
I had the great fortune to mingle with the who’s-who of CTOs and CIOs at the NoSQL Now!conference last month in San Jose. I met with Blekko CTO Greg Lindahl at dinner one night and he explained it to me:
In the world of search, it’s not just about the data, it’s how you enable people to access it.
Using the same coarse granularity Google uses, Blekko retrieves pertinent search data based on user query data. Once the query results are received, they are sorted and presented to the user in a familiar text format. Lindahl explains how the company’s search is based on three distinguishing elements:
- Algorithm inclusive of user input
- Customizable search engine settings
- Transparency
Collectively, these elements are what set Blekko apart from the company’s larger, search engine rivals.
I. Data Set
In order to achieve more useful results the data set which queries are applied to are not built with the philosophy “More is Better”, but more of a filtered smaller, precise data set. Eliminating spamming sites, link farms, and black SEO sites, Blekko allows the cream to rise to the top. A large part of this is due to human curators.
II. Wikipedia-like Policing
Using human curators allows for a much more narrowed and applicable search result but it can only go so far. Through the use of individual preference an end user can design their own query set which can become more relevant with extended use. A user can create their own ‘slashtag’ data set for sites they feel are most relevant to their particular needs. As an example, if one user were to query using the phrasing ‘cheapest computers,’ he might get things like craigslist.org or EBay. Using the ‘slashtag’ containing Dell, Sears, Target, CompUSA, and the like, that same Blekko search will generate a more accurate result.
An example Lindahl provided for a luxury goods product search demonstrates how more desirable results are generated from the Blekko, human-influenced engine.
Search: “Gucci handbags” (https://blekko.com/ws/+gucci+handbags)
Result:
Blekko’s results include websites selected by a slashtag editor, with known spam sites removed, and even a relevant a blog entry on the joys of owning a Gucci handbag. Each of these factors delivering on-target search results for the consumer seeking to buy a genuine luxury good product.
In the final analysis, Blekko has paved the way and raised the bar to greater, more desirable search engine results. Google is realizing this, and recognizes that end users are growing tired of all of the spamming, advertising and irrelevant data in their query results. The next step for Blekko should be to allow the owner of a particular website to submit their site for curator evaluation. The site owner could then receive a report from the curator, and adjust their site accordingly. It is this type of open architecture that will make Blekko successful.