Taking the juice out of Google

When Google launched it’s Custom Search Engine service 18 months ago, I expected thousands of CSEs to pop up all over. That’s happened, but I’m not aware that any in the areas I monitor have made a mark. Why so?

In the UK legal arena I know of only a few CSEs:

  • I put together a number focussed on UK blawgs, cases, legislation and government sites.
  • Nearly Legal’s LawSearch includes selected statute and case law, government guidance, reports, commentary, other resources and help and law blogs
  • More recently Struan Robertson at OUT-Law launched LawTrawlUK which currently includes 95 major law firm sites and some significant public sector sites

I’ve also tried out a number of others in related fields and, without exception, they all disappoint. This is not to detract from the genuine effort that has gone into them. They fail to excite because either one doesn’t know sufficiently precisely the scope of ones search, or if one does, one would prefer a different selection of sites, or the results feel unbalanced; and all the time one knows one will be missing some key results and those unexpected nuggets that a well-crafted global Google search would serve up. Narrowing the domain searched often takes away more than it gives.

For those reasons, when setting up my experimental CSEs, I figured that CSEs with tightly defined scopes might be a fruitful path to follow; within some I spent time pointing to specific folders and folder patterns rather than just the sites; and for some I added tags so that results could be refined. I did not get into weighting the results; that was time I wasn’t willing to spend initially.

Most CSEs I’ve come across are fairly basic, including just a selected list of sites to be included in the results. But to produce a CSE that does the business does require considerable thought and time implementing the advanced features: carefully and methodically selecting specific folders and/or file types using lists or wild cards, labelling the entries and weighting them.

4 thoughts on “Taking the juice out of Google

  1. Lots of great points Nick. I also expected more of these to pop up, and expected better quality results. The default functionality seems to cover about 95% of the collections I’ve seen to date.

    I think the points you make in the last paragraph are all very important, especially weighting. And interestingly, that’s where the tech/human division is. Weighting requires subject expertise, and crafting an actual collection. … nice to know the tech will only take us so far.

    Hopefully your post will inspire some new collections in the legal market. :)

  2. I have rather let mine go un-updated for a while, for two reasons. One being the labour involved when other demands (however self imposed) have taken priority, but the second and more important being precisely the delimitation issue that you raise.

    My CSE offering offers only a certain prioritisation over a standard google search and nothing over a well crafted search phrase. CSEs are perhaps best targeted at clearly delimited (and quite small) fields. I think I might pull my CSE from my site for that reason.

  3. Bang on the money Nick,

    couple thoughts about making CSE useful (as they stand)

    The personal CSE is an idea worth considering and can work really well when combined with delicious.

    For example, checkout

    http://www.search.deligoo.com/

    Which creates a cse for any tag, user or combination thereof. It becomes an excellent way to search all of the websites you’ve tagged with delicious. It also comes as a nifty firefox plugin.

    It’s definitely something they need to build into Google Desktop.

    Alternatively, a group of people could use the collaborative features of the CSE technology to build a search engine around a topic. However, the results I’ve seen from CSE aren’t yet relevant enough to warrant serious investigation.

    For example, I created an AM Law 100 search, but the results aren’t great. Looking for law firm publications gives you a warped view of what’s valuable, as only the highly linked pages get a lot of hits (new content seems to be relegated to the bottom of the list).

Comments are closed.