Can Google tame the law?

By Nick Holmes on December 4, 2009
One comment
Filed under Search, Semantic web

I recently posted on the FreeLegalWeb blog about Legal Opinions on Google Scholar. This was principally to question the assertion that the new service will empower the average citizen. But there are bigger questions to answer about Google’s ability meaningfully to address the needs of legal researchers.

For Google, scale is everything: index everything, analyse it with fancy algorithms and the results will speak for themselves. While this certainly seems to work for broad, mainstream data sets, can it work for the scholarly and professional where accuracy, reliability and domain-specific semantics are much more important, even essential? Can it work for the law?

Peter Jacso, professor in the Department of Information and Computer Sciences at the University of Hawaii at Manoa believes that with Scholar Google has produced a “metadata train-wreck” and he’s not optimistic that things will improve. Writing for Library Journal in Google Scholar’s Ghost Authors, Lost Authors, and Other Problems he concludes:

It must have taken some time to create such an imbecile parser. In the early days the GS [Google Scholar] developers decided not to use the metadata readily available from most of the scholarly publishers. …

The press and the public were so enamored of anything with the word Google in it that GS developers apparently believed they could create a parser to identify the metadata better than the human indexers at the publishers, repositories, and indexing/abstracting services who assigned metadata by listing author, title, journal name, publication year, and other metadata elements.

GS designers have sent very under-trained, ignorant crawlers/parsers to recognize and fetch the metadata elements on their own. Not all of the indexing/abstracting services are perfect and consistent, but their errors are dwarfed by the types and volume of those in GS. This is the perfect example of the lethal mix of ignorance and arrogance GS developers applied to metadata and relevance ranking issues.

It may be difficult for some to see why Goog would eschew explicit, accurate, publisher-provided metadata, instead relying on automatic recognition which amongs many other failings attributes articles to such authors as P Login (from Please Login) and N Subscriber (from New Subscriber), I Background and X Conclusions (from headings) and so on; numerous other examples of spurious Google-generated metadata are cited by Jacso.

But that’s the way Goog works; it does not index data in the same way and it is confident (arrogant?) about its approach. Legal Opinions on Google Scholar undoubtedly opens up legal research to more people and provides a useful, complementary way to search (primarily US Federal) opinions and link them together via citations; but it will be limited in its appeal; it will not empower the average citizen and it will not at present satisfy the seasoned legal researcher.

As with all things Goog, even if we see many flaws in its initial Beta offerings, we know it has to be taken seriously as it has the financial, infrastructure and intellectual resources to make things happen over time. To satisfy the more demanding it would have to beef up its editorial input and effectively become a legal publisher and that would mean treading on the toes of those it depends on. But as web data becomes more discoverable via linked data (aka the semantic web), so Google’s approach will bear fruit and start to challenge the “traditional” methods.

One comment

I really do not understand this. From GS’s About page:

“Google Scholar aims to rank documents the way researchers do, weighing the full text of each document, where it was published, who it was written by, as well as how often and how recently it has been cited in other scholarly literature.”

That, for me, is precisely what it does. It automatically hooks into databases which your institution is subscribed to, if it recognises your IP address as such, and automatically links to a PDF download of the article. Search just works. The following sentence by Jacso in particular has me puzzled:

“Not all of the indexing/abstracting services are perfect and consistent, but their errors are dwarfed by the types and volume of those in GS.”

Really? For myself, I always start with GS when searching for articles relating to a particular topic. Only after I have exhausted those possibilities, do I turn to Westlaw, etc, because (to my mind), the search power of GS is far more advanced than any of the closed proprietary databases that dominate legal research.

by Martin George on 4 December 2009 at 3:30 pm. #