KMWorld 2008: Information Discovery Trends

Presented by Theresa Regli, Principal, CMS Watch.

When people think of search, they think of Google; simple, give me what I’m looking for now.

The problem with information discovery and enterprise search is much more complex than that. We have many different systems and repositories that need to be accessed.

The idea is that you should be able to find information no matter where it is, and to find information that you don’t know is there. This often requires a discovery process that is not very direct. It’s not that simple; it’s much more complex than “I’m feeling lucky”.

CMS Watch uses a vendor risk profile to evaluate the wide range of tools out there. For each solution provider, Vendor Evolution is measured against Product Development.

Most vendors who historically called themselves search companies are now calling themselves Information Access Platforms – they are trying to integrate with other systems in their clients’ organization. While the marketing message has changed, the core focus remains the same.

Theresa cautions people to be wary of the marketing message – but keep in mind the ultimate window to knowledge does not yet exist. They don’t have access to every repository from one place – yet.

Security is one of the most significant limitations of these systems – who has permission to view what, at the document and repository levels. Vendors are focused on addressing this issue.

How vendors deal with structured and unstructured content is also an important consideration.

Most solutions require a significant amount of tweaking and specialization to make them work the way you need them too.

Non-textual assets present a particular challenge – how do you index and search these assets (maps, images, audio, video, maps, etc.) Search tools have a hard time figuring out how to describe these assets. Metadata is part of the solution; in some cases, OCR and speech-to-text may be used.

Rights management for rich media is also an issue – how to manage the metadata for rights and resuse of these assets.

Auto-categorization is also a trend — using software to index content automatically. These tools are useful but not fool proof (differentiate a picture of Tiger Woods vs. a picture of a tiger in the woods).  It takes time and training, and search vendors continue to develop the technology to be more contextual.

Vendors are integrating standards such as Dublin Core metadata to aid in categorizing information.

User Interface Trends:

Vendors such as Oracle have made their search results resemble Google’s; there was a study that indicated that if your results look like Google, they will trust it more.

Also introducing filters and facets to narrow down the results. The facets must be customized for them to be effective.

Another trend is saving searches over time, so that renewed results are displayed in saved searches.

Coveo produces results that look like Google, but also include other topics the result falls under. Filters are also listed on the left to refine the results.

Theresa’s favourite search results pages is Exalead, which provides a dashboard of results.

Funnelback, a vendor from Australia, provides mashups with other apps such as Google Maps.

Social Search is another trend; Vivisimo is doing the most with social search, but Theresa isn’t sure how many people are using it. The hit list provides you not only with rated search results, but a list of people profiles who may know more about how to find the answer.

Search and Business Intelligence where data is sliced and diced to enable analysis. Tends to be used with structured information such as sales data to help the organization decide what to do next.

Reporting and analytics which provides a dashboard that describes how people are using the search engine.  This information will help you adjust your taxonomy and metadata to improve the search performance. Also provides info when searches are occuring.

Tuning field ratings allows search administrators tweak how attributes and variables are weighted to improve the relevance of the search results.

The importance of Scenarios: you must understand what situation you’re in to determine which tool(s) will work best for you. Structures vs. unstructured information, data, text, multimedia; types of employees, from knowledge workers to informed transaction processors.

Most vendors do not do well in more than 3 areas. Get customer referals and case studies. Try before you buy via 30-day demos that can be tested in your own environment. Make sure the vendor you choose has done work in your industry, with organizations with similar information types, user types, and challenges.