Generally speaking, when you have search on a site, there are two things which are going to happen:
1) the index for the search will be quite large in order to keep server load down
2) the server load will be quite high because of a smaller index

This essentially means nothing to people that don’t care about such things… which for the most part usually means “everyone”.

For those that do care, they tend to put in a search on the site which makes use of Google. Google has a feature which allows you to limit your search to a single website. You do this by adding “site:” before the base domain name and then whatever you wanted to search for.
So say I wanted to look for “puppies” on EphBlog, I would go to Google and do a search “ puppies” and it would then show me its results that it could find.

We had a form in our sidebar which did this for us – if you entered search words, it would send you over to Google with the query structured in that way.

The problem, as apparently many people have noticed, is that I don’t own Google. I have no control over how, when, or why they index our site. We have the “Full Archives” link up so that it can go in there and then get every single page and theoretically it should index our whole site, and then provide a cheap and easy way to search the site (read: free).
But unfortunately it wasn’t working for some search terms and I have no clue why.

So now we are using our own search system on the site. The form over on the sidebar looks just like it did before, except that now it doesn’t go out to Google and instead stays right on the site and uses our server to process the query instead.
The good side is that we can format the results pretty much however we want.
The bad side is that theoretically it provides an avenue for a much higher server load for very little return. That said, it probably isn’t something that will get hammered since our userbase appears to be under 1000 people at this point and I doubt all of you are going to search for something all at the same time.
On top of that, I think the new MovableType (which we are using), builds a database index as it is, so it should be that first condition up there that I mentioned – more disk space, less CPU load (at search time – still high at index building time).

Anyway, that is probably not sufficiently nerdy for those of you that really truly know this sort of thing (*cough*DeWitt Clinton*cough*) and probably far too nerdy for those of you that don’t really care (everyone else).

Have at it.

Print  •  Email