Dutch scientists believe that with the "normalized Google distance" of terms of an artificial intelligence to be able to teach the meaning of words automatically
How could computers learn language, including understanding the meaning of words and the relationships between them? This problem of semantics is a formidable task, which has so far only been partially solved, since words and word combinations often have several or even many meanings, which, moreover, depend on the linguistic context. The two Dutch (An artificial consciousness from simple statements). Paul Vitanyi and Rudi Cilibrasi of the National Institute of Mathematics and Computer Science in Amsterdam propose an elegant solution: to look up on the Internet, the roughest database that exists, just use Google.
Objects like a mouse can be identified by their names "Mouse" the meaning of general terms must be learned from their context. A semantic web for the representation of knowledge consists of the possible connections that objects and their names can make. Of course, in reality new names can be created, but also new meanings and thus new connections. Language is alive and flexible.
In order to teach an artificial intelligence all the meanings of words, a huge database of possible semantic networks had to be built up and constantly updated with the help of human experts or even many employees. But this was not necessary, because with the Web there is not only the largest and largely free to use semantic database, it is also constantly updated by countless Internet users. In addition, there are search engines such as Google, which quantitatively measure the probability of connections between words and thus their context of meaning by indicating the web pages on which they were found.
Using a method previously developed by Paul Vitanyi and others, which measures the relationship between objects (normalized information distance – NID ), the closeness between certain objects (images, words, patterns, intervals, genomes, programs, etc.) can be determined.) can be analyzed on the basis of all properties and determined on the basis of the dominant common property. similarly, the commonly used, not necessarily "true" Meanings of names can be unlocked with Google search.
At this moment one database stands out as the pinnacle of computer-accessible human knowledge and the most inclusive summary of statistical information: the Google search engine. There can be no doubt that Google has already enabled science to accelerate tremendously and revolutionized the research process. It has dominated the attention of internet users for years, and has recently attracted substantial attention of many Wall Street investors, even reshaping their ideas of company financing.
Paul Vitanyi and Rudi Cilibrasi
If you enter a word like "Horse", one obtains from Google 4.310.000 indexed pages. For "Rider" there are 3.400.000 pages. If both terms are combined, there are still 315.000 pages recorded. For the common occurrence of, for example "Horse" and "Bart" will still be an astonishing 67.100 pages, but one can already see that "Horse" and "Rider" hang closer together. This gives a certain probability for the common occurrence of terms. From this frequency, which is compared to the maximum amount (5.000.000.000) of indexed pages, the two scientists have developed a statistical coarse, which they "normalised Google distance" (NGD), which is usually between 0 and 1. The lower NGD is, the more closely two terms are related to each other. "This is an automatic meaning generation", Vitanyi tells New Scientist. "This could well be a way to make a computer understand things and act semi-intelligently."
If such searches are performed again and again, a map of word connections can be created. And from this map, in turn, a computer can, it is hoped, also grasp the meaning of individual words in different natural languages and contexts. For example, some searches have shown that a computer can distinguish between colors and numbers, 17th-century Dutch painters have shown. The search allows the user to distinguish between the words of the eighteenth and nineteenth centuries and between emergencies and near-emergencies, or to understand electrical or religious terms. Moreover, a simple automatic English-Spanish translation could have been accomplished.
In this way, scientists hope, it will be possible to learn the meaning of words, to improve speech recognition, to create a semantic web and, of course, to finally achieve better automatic translation from one language to another.