How much disk space do I need for Seeker indexes?

May 25, 2011

As we continue to roll out new instances of the Dovetail Seeker search engine to our customers, one question that frequently arises is: How much disk space do I need for the Dovetail Seeker indexes?

A little background

Dovetail Seeker contains two major components: indexing and searching. Before you can search for your data, you need to index it.

An index is a collection of searchable data organized into documents, each having many fields of information. Every document in the index is a potential search result with each document’s field potentially containing one or more searchable terms.

For example, you will likely wish to search for cases. For each case in the system, the indexing application will add adocument to the index containing details about that case. The document will contain at least an id, title, and case summary. Once a document for that case is present in the index it can be searched – the case id, title, and summary are available as search results.

Files can also be indexed. When the indexer encounters a CRM attachment or is told to index a directory a file document is added to the index with the text extracted from the file used as the summary searchable contents.

These indexes used by Seeker are stored on disk, not in the Clarify/Dovetail database.

So how much space do I need?

My typical consultant answer: it depends.

It depends on the size of the data that’s being indexed. For example, lets say we’re indexing a case, which includes the case history. Is the case history small, such as just a few notes? Or is it large with tons of notes, phone logs, inbound and outbound email logs, etc? One case might be only a few kilobytes in size, while others might be 100KB or more.

The larger the amount of data that needs to be indexed, the larger the index.

How about some guidelines?

I’ve asked some of our customers who use Seeker to share their specific details. I’ll post specific information below.

Averaging out customer data, a good general guideline seems to be about 2 KB/document, where a document is a case, contact, subcase, solution, etc.

Based on that estimate:

if you index 100,000 documents, then the space required would be 195 MB
if you index 1 million documents, then the space required would be 1.9 GB
if you index 10 million documents, then the space required would be 19 GB

Overall, the storage is a relatively small amount. Since Seeker uses the excellent Lucene.Net search library, we really owe much of this performance to them.

How about some real-world specific data

The following collection of data is specific, real-world, customer data:

Total # of documents	Document breakdown	Total Disk Space	Disk Space / Document
4,794,208	4.6M cases 75,000 solutions 1500 contacts 75,000 subcases	7.66 GB	1.6 KB/document
1,729,402	1.3M cases 39,000 CRs 20,000 Problems 4000 subcases 319K Logistics	1.35 GB	0.81 KB/document
141,356	72,000 cases 4000 solutions 64,000 CRs 1200 subcases	264 MB	1.9KB/document
8,667	8667 Cases	25.2 MB	3 KB/document
35,085	30,237 Cases 4,848 Contacts 896 File Attachments	76.3 MB	2.23 KB/document

As you can see, there’s some fluctuation compared to the general guideline, hence my non-committal response of “it depends” still stands. Regardless, we’re in the right order of magnitude.

How about external files

In addition to objects from the database, Seeker can also index external files. These could be attachments (such as case or subcase attachments), or a collection of files, such as product documentation, whitepapers, etc.

The following is specific data from indexing of files:

Fileset	Total file size	Total Disk Space for index of these files	Percentage of index size to file size
56 PDF files	90.5 MB	3.75 MB	4.1 %
56 MS Word DOC files	32.3 MB	196 KB	0.6 %

Have data to share?

If you have Seeker implemented in your environment, and you haven’t previously shared your Seeker sizing stats with us – please do so. We’d love to hear your specifics numbers. Email me at gary@dovetailsoftware.com

Postlude

Hopefully this information is useful when planning out your Dovetail Seeker implementation.

Rock on.

A little background

So how much space do I need?

How about some guidelines?

How about some real-world specific data

How about external files

Have data to share?

Postlude

Related

Dovetail

HR SOLUTIONS

CLARIFY SOLUTIONS

How much disk space do I need for Seeker indexes?

A little background

So how much space do I need?

How about some guidelines?

How about some real-world specific data

How about external files

Have data to share?

Postlude

Share this:

Related