15 févr. 2005

Jérôme Test - Yahoo pedals in the yahoourt...

After the reading of two eulogistic tickets "Y!Q:a contextual tool astonishing and innovating! " (Abondance ) and " Y!Q Search de Yahoo! " ( Kesako? ) concerning the new tool of Yahoo! , Y!Q Search , I was in a hurry to test this last.

First contact

The first contact is of course the discovery of the interface of research. Sober, minimalist, just fields of research, a logo and a button. In short, an interface "à la Google " . Perfect, one does not ask any more one search engine. What holds my attention particularly, usual the fields INPUT (zone of simple input of text, comprising only one line) used in the near total of the interfaces of research, replaced here by one TEXTAREA (zone of input of text made up of several lines). Yahoo! would it try me to suggest that I will be able to carry out research by using complex requests, of more than two or three words, or even in natural language ?

First tests

During my first tests, just like my two accolites referred to above, I can only note, the impression of relevance of the results obtained.
Note: I would speak doubtless in a forthcoming ticket about the concept of relevance, in order to try to define this term in the context of the search engines on Internet. It is indeed relatively easy to evaluate the relevance of a search engine on a corpus closed (as at the time of campaigns TREC - T ext. RE trieval C are onference), but on an open corpus like the Web, which methodologies and the metric ones to use? How to evaluate the relevance of a search engine without being able to put it in a box of Cetri in order to study his behavior in a closed environment?

Several things worry me. First of all, the number of results is relatively weak . It is about a tool having its own base (independent of that of Yahoo! and thus perhaps less well provided in this case), or it uses the base of Yahoo! ? I thus make some fast measurements of the number of results turned over by Y!Q and Yahoo! :

QuestionNb. Docs. Y!QNb Docs. Yahoo!
prime numbers19 400110 000
nail varnish14 10026 800
technologies of the language789251 000

The report is fast, the number of documents answer is largely weaker on Y!Q than on Yahoo! . But to look there of readier, it seems that that is not true for the stem interrogations. That would mean it that the essential difference between Y!Q and Yahoo! either what the first E seeks that the exact expression and not each word? A second experiment is essential:

QuestionNb. Docs. Y!QNb Docs. Yahoo!
"prime numbers"19 40019 400
"nail varnish"14 10014 100
"technologies of the language"780734

Good. I believe that the things are clear on this point:
The originators of Y!Q remembered what often my reader repeated me, Christian Fluhr (one never listens to enough his Masters!), concerning the importance of the taking into account of the expressions made up and idiomatic in the relevance of a tool of search for information.
Y!Q does nothing but seek the exact expression of research

Preprocessing of the request

Where Y!Q is distinguished, it is that it carries out a preprocessing of the request of research. Indeed, it filters question from of to eliminate terms useless (in a way sometimes a little riding as you will see it below), then it cuts out the request in several under-expressions (that thereafter, one can select or désélectionner to change the filter of research).

For example, starting from the request "prime numbers and technologies of the language in the search engines" , here what Y!Q proposes like topics of research:

Yahoo! is not very "fair-play" with Jean Véronis , who however used as linguistic téléscope (cf comments ) it does not have there very a long time in his very good ticket " Yahoo and the yahoourts ", since the air of nothing, it completely eliminates the part "technologies of the language" from the request (just like he eliminated the blank words... sic!)

Thus let us try to push a little more the tests on this preprocessing...
So now, I carry out a research with the request: " prime numbers and technologies of the language in the search engines" (I just added the two blank words "" in fat). And well, Y!Q reconsiders its preceding analysis. Not how for him "technologies of the language" became a relevant expression for research, but finally, as I now have three times the term "them" in my question, Y!Q estimates that this term must certainly be relevant for what I seek!

If the number of events of the words of the question has an importance in the preprocessing, I thus will try to add several events of the expression "technologies of the language" ...
It is necessary that I add two events of this expression (we thus let us have now three of them) so that Y!Q adds in my research the language term ... continue... I am not obstinate (if?), but I want absolutely that there retains technologies of the language in its search criteria... it is not nothing to make, at the end of the fourth event, the term " them" disappears (???) from the list of the expressions retained for research, but "technologies of the language" is still not retained. Even with more than one ten events in the question I do not reach that point!

I think whereas the expression " search engines" is certainly too "strong" compared to the other expressions of research. I thus remove it. Nothing made there, the "prime numbers" largely crushes "technologies of the language" ...
Even report with the question "technologies of the language in the search engines" , only the expressions "search engines" and "them" are retained for research...
Last test. I capitulate, and I launch a research with only "technologies of the language" , and always nothing. Only the two terms "language" and "them" are retained for research.

Y! to look with twice...

Finally, my first impressions of relevance quickly changed into impressions of "I include/understand anything the whole so that it makes my requests this @#!\%ù§& of Y!Q !!!!"

At the end of this first making of contact, I have a little evil to be sensitive to the humour of Y!Q ...

... bus indeed, Y!Q is magic, it is even of the great illusion!!!!

