Developing a search strategy

A Page on the Web, published in the Solicitors Journal, February 2000

July last year we looked at how to find things on the web – ie using search engines. This included some detail on how to refine searches using Boolean operators and other advanced techniques. Now librarians, information professionals and a minority of other readers may be comfortable with this, but research shows that the vast majority of web users simply type a few words into a search box and see what happens. It seems sensible therefore to concentrate on two key factors determining the relevance of your search results: choosing the ‘right’ search engine and choosing search terms carefully.

Choosing a search engine

Most search engines will return results for pages that contain any of your search terms, ranking results by weighting some or all of the following:

  • the number of search terms matched
  • the frequency of occurrence of the terms
  • the part of the page in which the terms are found (ie title, body text, keywords etc)
  • the proximity of the terms
  • the order in which you entered the terms

Though these search engines will return millions of hits, the most relevant sites will generally be contained on the first half-dozen pages. Even so, there will be a number of results on those first pages which (for one reason or another) are not relevant and many relevant results which are buried much deeper down the list and likely never to be seen. Also the ranking applied may at best seem haphazard.

Google

Google, a relatively new search engine, uses an entirely different approach and in my view stands head and shoulders above the rest.

Google returns results for pages that contain all your search terms, ranking pages according to their popularity (according to how many other pages link to them) and according to the proximity of your search terms. Google also exerpts the text that matches your query.

Google’s page ranking method deserves further explanation. In essence, Google interprets a link from page A to page B as a vote by page A for page B. Google assesses a page’s importance by the votes it receives and also analyzes the page that casts the vote. Votes cast by pages that are themselves important weigh more heavily and help to make other pages important.

Google’s page ranking system is therefore a general indicator of importance and does not depend on a specific query. Rather, it is a characteristic of a page itself based on data from the web that Google analyzes. It precludes human interference – so no one can ‘purchase’ a higher rank or commercially alter results.

Choosing your search terms

It is of course very difficult to generalise about the use of search terms, but it is worth bearing in mind always that a web search engine has indexed the whole of the web and you should therefore be aiming as much to screen out or demote the irrelevant or low value sites as to find the relevant.

A few pointers therefore for selecting terms for ‘legal’ searches:

  • Use terms that lawyers rather than laymen would use, eg conveyance rather than purchase and sale. This will screen out for example many ‘law for the layman’ sites.
  • As an extension of the above, use titles to Acts, SIs etc which are bound to be referred to. This will screen out or demote for example low value information pages on law firm ‘brochure’ sites.
  • If you are researching local law, include UK or England or Scotland as appropriate as a search word. This will demote sites concerned with overseas jurisdictions, including of course US sites, but also Australian and Commonwealth sites whose terminology is more similar.

A worked example

I am looking for information on the new rules relating to ancillary relief in matrimonial proceedings – both the new rules themselves and related commentary.

I decide to use the search words Family Proceedings Rules 1991 as I know this is the title of the governing instrument which will have been amended. By using these words, I will also screen out low value sites.

Typing these words in to the AltaVista search engine produces 7,425,420 results (ie where one or more of these words occur). The top half-dozen pages do appear to include many relevant sites, but these are spoilt by a large number of irrelevant or low value results and the ranking of the results seems haphazard.

To get a better result in AltaVista I specify that all words must occur – by typing +Family +Proceedings +Rules +1991. This of course produces far fewer hits (1,886) and the first few pages are mostly relevant sites though the listing is still flawed. I then try the phrase “Family Proceedings Rules 1991”. This produces 31 results, but by being so precise I may well have excluded some relevant sites. So finally I try an advanced search using the NEAR operator between each word. This produces surprisingly few hits (41). All are highly relevant though in apparently random order.

Now using Google I type in the words Family Proceedings Rules 1991. This produces 28,900 hits. On the first few pages all hits are highly relevant and their ranking seems very reasonable. The first non-relevant site is at position 28 and this because Florida appears to have Family Law Rules of Procedure dated (or numbered?) 1991.

Conclusion: I have achieved a far better result with my first try with Google than I have with several, more complex, searches using AltaVista. Try it!