Home   Technology   Services   Support   Partners   News  
Omniware
  FOUNDERS |  EVENTS |  REGISTER | CONTACT 
 
 
 

Linguistic Technology

The next breakthrough in Information Technology is going to be the ability to actually "understand" (not just scan) the texts posted on the World-wide Web. This will radically redefine the way we work with the Internet and what we do with it. The first benefit will be to relieve users from the "information overload" problem, the excess information returned by search engines. Because it can "undestand" the text. Netcogito will select only the webpages that are truly relevant, and then will display a summary for each webpage. Another benefit will be to make the search largely language-independent. Netcogito will be capable of generating a summary in any language of the world. Any webpage written in any language can be summarized by netcogito in any other language.

Our technology consists of three man components. ú Searcher. Because of its understanding of the text, we can search webpages better. We can discriminate whether the webpage talks about William Shakespeare the Palo Alto plumber or William Shakespeare the British writer. ú Summarizer. Because we have understood the concepts the text is about, we can produce a short summary of each webpage. ú Translator. Because of the way the Summarizer works, and the format in which the summary is stored, we can produce the summary in different languages. (We cannot provide a one-to-one translation of the original text, but we can provide a summary in any language of the original text). These components, besides providing a better user experience on the web, enable a whole new class of e-services (see later).

The main appeal of the search-engine market is the sheer number of users. The search engine is the number-one tool used by the public to navigate on the internet. Consumers rank search as the second most important function on the web, besides email" (Upside Magazine, Apr 2000). According to a study by Zona Research, search engines are used 77 percent of the time by people looking to find information on the web. In fact, the exponential increase in the size of the web is likely to cause an exponential increase in the use of search engines. The web currently has more than 800 million pages (Nature, July 1999) and will grow to 13 billion pages by 2003 (IDG, May 2000). This phenomenon calls for a larger number of players and for continuous technological progress. Another appeal of the search-engine market is the relatively low entry barrier (the technology employed by existing search engines is rudimentary) and the relatively low cost of marketing a new search engine (word of mouth is the main marketing tool).

Relevance ranking of webpages is already critical and will become ever more critical. Most search engines use "search term frequency" as a primary way of determining how relevant a document is. In other words, they search "by keyword". Some search engines index web documents by the meta tags in the documents' html code. A recent trend is to rely on analyzing the behavior of the "aggregate population" of web users (in other words, how many pages link to a page). But all existing search engines rank pages, first and foremmost, "by keyword". This causes two undesired problems: underloading and overloading. ú Overloading. Too many non-relevant pages are found when searching "by keyword": they happen to have the right keywords but they are about another subject (e.g.: "an earthquake hit_" and "the latest hit by the rock band Earthquake_" are completely different concepts even if they share the same keywords "earthquake" and "hit"). ú Underloading. Too many pages that are relevant are not found when searching "by keyword": they don't have the right keywords even if they are about the right subject (e.g.: "plane crash" and "air disaster" are the same concept even if they use completely different keywords) A "meaning-based" search engine would, on the other hand, consider the "concepts" (not the words) that are mentioned in the webpage and search "by concept". It would greatly reduce both underloading and overloading. Besides providing a better user experience when navigating the web, such a meaning-based search engine would enable a whole new class of e-services (see later).

Netcogito's search engine combines the ability to "index" the web by concepts and the ability to "search" the index by concepts. Netcogito's "cognitive mapper", based on Artificial Intelligence techniques for syntactic and semantic analyses, disambiguation and knowledge representation, is capable of creating a "cognitive map" for each webpage by identifying the concepts and the actions expressed in the text of the webpage. Each webpage is reduced to a list of concepts and actions, ranked by "concept proximity". Linguistic knowledge and real-world knowledge are used to map the text into an internal ("meaning-based") representation. Rather than searching this index for a keyword, the Netcogito search engine searches the index for concepts that match the user's query. Thanks to Netcogito's cognitive mapper, different words with the same meaning (teacher, instructor) have the same representation. Identical words with different meaning (rose as in flower, rose as in color) have different representations. This internal representation of the text is purely logical (i.e., language-independent). Because the internal representation is based on concepts, the search engine can rank pages based on concept proximity. Furthermore, the search engine can display more than just the URL and the title of each page: it can display a list of the page's main concepts (which basically constitute a summary of the webpage). A more expressive listing of webpages helps the user select the most relevant ones. Because the internal representation is language-independent, the concepts can be "rendered" in any language. For example, the webpage can be in English and the user can request a summary in German. Netcogito is therefore capable of providing a summary of the webpage in any language, regardless of what language was used in writing the webpage.

The same technology of the Netcogito search engine enables a new generation of applications. ú A "point & search" variant of the search engine allows to run the search engine without using a browser. Brower-free searching is a more natural way of interacting with the web and it can be embedded in any other application. The user can select and right click on any phrase displayed by any application (word-processor, spreadsheep, e-mail, etc). The selected text is then automatically fed to the search engine. ú Because webpages are indexed according to concepts, Netcogito can also "classify" them automatically in predefined categories. Netcogito will therefore serve as a tool to automate the creation and update of web directories (that today are compiled manually by the likes of Yahoo!). Netcogito can easily provide automatic indexing of a webpage into industry categories based on the concepts in the webpage's cognitive map. ú Netcogito's technology for extracting concepts from texts can also be applied to "whole document query": find all documents that are similar to a given document. Netcogito can easily built "personalization databases" that can be used to customize the user's experience when navigating the web and when shopping with an online catalog. Netcogito's personalization engine will remember what concepts the user is interested in and (upon request) customize all the searches according to those "preferences". This personalization engine can be embedded in e-commerce solutions that aim at providing a customized catalog for each user. ú By roaming and scanning the world-wide web, Netcogito's search engine can gather information from newsgroups, press releases, reports, product reviews, etc that help create Corporate Intelligence Analysis ú Personalization is required by most eCommerce applications. Netcogito is capable of building a "vocabulary" of concepts frequently "searched" by the user and therefore customizing her Internet shopping. ú Last but not least, Netcogito will cause a revolution in the desktop metaphor, reorganizing the desktop for the internet age. While today's desktop uses a (static) icon as a reference to an application which resides on the PC, tomorrow's desktop will be primarily the link to the world of information, and its icons will be mainly "dynamic" icons, references not to applications but to sources of information spread around the internet. A dynamic icon is an icon that causes a search of the internet to be executed by the search engine, not an application to be run from the hard-disc by the operating system. A dynamic icon is a natural evolution of browser's bookmarks. In concluding, because of its understanding of the text, Netcogito provides a better user experience (concept-based ranking, summarizer) and, at the same time, enables a whole new class of e-services: browser-less search, whole document query, B2B mediator, personalization engine,etc.

Auditing IT investments

Omniware partners with NestPlan to audit your IT investments and recommend how to maximize your ROI.
"Get better returns on your existing IT investments
rather than new big initiatives" Ask Omniware specialists to audit your IT investments.
OMNIWARE INTERNATIONAL & PARTNERS