The X’ (ML) Files – Finding Data in the Deep Web

“Since the user interface of the commercial product was not as flexible as we required, we needed to build our own user interface layer and use the application program interface (API) of the commercial application to handle the connections to multiple sources, the searching, merging of search results, de-duplication, and ranking, said Tennant.

“This also required us to work with the vendor and the product user community to create a prioritized list of enhancements to the vendor’s API and wait for those enhancements to be provided (which they were).”

Relevant Results

The work doesn’t stop once a site is up and running. The U.S. Department of Energy’s Office of Scientific and Technical Information (OSTI) maintains the science.gov site which provides a common public search interface for thirty scientific databases of a dozen federal agencies, as well as the newly launched worldwidescience.org site which searches the scientific databases of ten countries.

In February, OSTI released the 4.0 version of science.gov—created and maintained by Deep Web Technologies—which included relevance ranking based on the full text of document, rather than just the metadata and summary.

“Adding full-text relevance ranking was the most significant improvement, but there were others,” said OSTI director Walt Warnick. “We also added alert services where you can put a query in and each week you get an email about anything new that has turned up in any of the thirty databases, without repeating what you found previously.”

And that, as Fuess said, is the key to developing an effective federated search engine: Including all the relevant data sources, but without burying the user with more hits than he can possibly look at in the time available.