Sunday, August 26, 2007

Google Search Appliance.

We've just got one here. It sucks. It really shows that PageRank is the secret sauce that makes Google work. Without PageRank, all you have is an unsorted list of documents that contain the keyword. Worse than useless.

So, let's say that you are looking for the design documents for a piece of code. Let's call it Foo. You search for Foo, figuring that the design documents would be considered important and at least listed on the first page.

Nope.

First page is full of release notes for patches that delivered Foo. Then you get test exit reports. Then the list of audit reports showing the customers that have it installed.

This is undoubtedly because it is sorting based on date, and the dates on these files are newer than the ones on the word document. However, release notes have no real importance unless you are searching for them specifically. Same goes for test exit reports. They are both write-once documents. Even the audits are easier to find elsewhere.

I never did find the design document for it. The first architectural document that might even be related was on page 2. It was a 5 page .pdf of diagrams.

Anyways, it just re-affirms my opinions. Google with PageRank? Good. Google Appliance? Terrible.

12 comments:

Anonymous said...

Have you tried to do the 'filetype:pdf' thing if you know that a document is a PDF. It'll cut through those txt files at any rate.

Jason Pollock said...

Except that the document I'm looking for isn't likely to be a pdf... It might be a pdf, it might be a doc, it might be a page on the wiki.

So, let's say it's a web page. All of the automated build results are also pages, and there are a lot more of them than there are documents. Same goes for the customer audits and even the source code.

Yuck.

Anonymous said...

Jason I think you need to focus on how the functionality on the GSA can address your problem.

You should try creating collections that cover the document types you need. e.g. a collection for release documents, one for design documents. You can then specify which type of documents you want to search through the Search UI.

Here is one example of how the collections can be selected with radio buttons or tabs to help organise the information.
http://support.plato.com/search.asp?client=blank&proxystylesheet=blank&output=xml_no_dtd&proxyreload=1&q=network%20configuration&site=PlatoSupport-KB

You should be aware that page rank is only one of many ways that documents are ranked on the GSA and in the Enterprise is rarely the most important. There are many thousands of GSA customers who seem rave about the relevance even on file sysytems and document management systems where page rank has no role.

Dont give up just yet!

Jason Pollock said...

Hey, wow, Google Marketing! Hello Google Marketing!

Just a tip. Anonymous doesn't suit you. ;) Oh, and the link? Not clickable.

I'm not the guy who set it up, just (it seems) the only guy to use it.

I'll have to ask the guys who bought it to read the documentation that came with it. Not surprising that they just plugged it in and turned it on.

Perhaps you guys need to look at some more obvious default settings?

Until I see it working, I still say, "Stay away from Google Search Appliance".

Anonymous said...

Sounds like the usual choice between (a.) researching the market, buying a good search engine and allocating sufficient resources to implement it properly, or (b.) paying for a big name brand, plugging it in and crossing your fingers.

Like they say - good, cheap, fast. Pick any two...

Enterprise search software is a mature market, having been around for decades, whereas Internet search is relatively new. Since when did an immature product with lots of marketing hype win over stable mature products with better functionality? Oh yeah - Windows. Doh!

Determinist said...

I did notice on that same page there there is a "sort by relevance/date" selection in the upper right hand corner. Still - I tried the same search as you did and no luck with either sorting.

I asked my boss (a huge google engine fan), and he said that it was easy, just put in fooindex and out pops the document.

I don't know how he knew how to put that in - not exactly intuitive.

Anonymous said...

google marketing! nice guess but wrong, just someone who understands that you may need to actually configure a solution to take account of your requirements. Maybe you should provide a more informed opinion if you want people to read your blog.

Jason Pollock said...

It must be in the documentation somewhere. Documentation that isn't available from the home page of the appliance...

Oh, and Deterministic? He put FooIndex in because it was the name of a missing page from another page (based on the trouble ticket I raised on the subject). If you look, it wasn't FooIndex that he was searching for but BarIndex. (Unless I misunderstood)

I get the feeling that searching by filename is very accurate. Probably almost as accurate as "locate"!

Jason

Unknown said...
This comment has been removed by the author.
Unknown said...

Long angry rant removed and shortened to the sound bites:

Jason,
we did not plug and pray with the Google Mini, and I believe you've sadly misrepresented us, your IT guys, and the Google Mini, so let me flesh out the background story a bit:

It was purchased in a 50k document licence livery as a proof of concept, and we've been pushing for an upgrade to either a 300k Mini licence or a 500k Google Search Appliance since.

Ideally the company would have 4-6 GSA's (which stack, providing a single search UI, and they can directly address actual db's where the Mini can only do the likes of Access and jfox), but someone at a higher level than me incorrectly guesstimated that we'd need more like 12 and manglement balked at the price before insisting that we reinvent the wheel, a'la:
http://www.rawiriblundell.com/?p=841

The 50k licence is the major limiting factor that is causing all the heartache. If we had 300k, we'd actually be able to do useful things with the Mini and really get it working for us - instead it is sitting practically idle, and that's a real shame, as well a wasted investment.

At the moment we have it trawling a mere handful of things in the NZ office only, and through some elegant regular expression exclusions we've got the number of dupes down from the millions to the dozens (or the nones, we're very close), and we've implimented collections so that you guys can narrow your searches to a particular branch e.g. Archive, Release, Documents etc

On top of that, Google hacks and trickeries work, such as inurl:keyword, which I unsuccessfully argued would work as an internal document security tool.

So yes, you're partly right, but you're misrepresenting the appliance based on an initial impression before we'd even knuckled down the configuration, and without taking into account the other factors such as the severe tied-hands of the 50k licence, and the chi-wrecking from manglement.

Please, feel free to get some background information from us next time. And to anyone reading this on the blogorama: take blog posts with a grain of salt and get yourself a wide range of facts and opinions before making a decision.

Your friend in hating meetings :)

Ra

ps. The Mini's OS has been upgraded twice now, so results relevance and sorting should be a lot better than when we first got it.

Jason Pollock said...

Ra,

I wasn't slagging you guys off. The marketing materials for the search appliance basically lead buyers to believe that it can be plugged in an turned on. I was saying that you guys shouldn't be faulted for trying exactly that. That it failed to "work" is _Google's_ fault, not yours.

From what I've seen of how the GSA has evolved at eServ, it started out terrible, got better for a little while, and has become less useful as time moves on. This is probably because the documents are shifting to non-indexed locations.

I am still firm on my opinion. GSA - not as good as people think. It isn't a magic fairy wand that you wave over your network to solve your document retrieval problems.

I'm continually amazed which posts have legs! This one's from 2007!

Unknown said...

Hi Jason,
We have used GSA and faced some issues. But with newer releases its getting better - value for money?not sure... but since the investment is 'done'..getting even smaller gems like getting what you are actually searching for is hearth warming !!
..collections work magic

thanks
HT