In content management, metadata is used to uniquely identify content objects, improve search, and manage the life cycle of content. In some cases, it can even reference information that is not necessarily explicit in the content object, such as a project ID number that doesn’t appear within the text of a document. We use metadata to describe and provide context to our content. In essence, metadata is about language.
In order to achieve consistency for describing content objects and to facilitate retrieval, we must have vocabulary control. As we discussed last year, developing a taxonomy or classification scheme allows an organization to apply consistent vocabulary control for all content across the enterprise.
Just think about this: The average person has a vocabulary of 20,000 unique words. Without a controlled vocabulary, we could end up with a multitude of terms—some contradictory, inaccurate, or confusing—which only creates more obfuscation than clarity. On the other hand, if we can only use 10% of that vocabulary, then we are left with a much more manageable number of 2,000 terms.
The reality is that metadata can be developed by users or by a folksiness system (tagging without rules). After all, they’re easy to develop, cheap, and “metadata of the masses.” However, they’re also ineffective over time due to inconsistencies in structure, vocabulary, and spelling. In this light, metadata by itself could be meaningless.
In this age of powerful search engines, you might wonder why we need taxonomies at all. In fact, an executive at Mozilla asked this very question. I’ll tell you the same thing that I told him: Let’s consider an organization with two million documents in a system, and you’re looking for an invoice for a part that was ordered last year. Now, a full search for that invoice and part number might take a while, expend system resources, and could result in hundreds of documents—or none—and too many false positives because the parameters are too broad.
When using a hierarchical taxonomy or classification, the search is narrowed to “accounts payable,” then “invoice,” then the “previous year,” and then the “vendor.” The search would be fast and more accurate. However, for that search to be effective, the system and the search interface must include the right metadata.
Search engines, such as Google, Yahoo, Bing, and others, have spent a gazillion dollars developing taxonomies and indexes in powerful databases, as well as ranking algorithms to determine what appears in the results. The bottom line is that organizations need to invest in taxonomies to support navigation and find-ability.
The key takeaway is that metadata needs to be managed, normalized, and controlled. User metadata or tags can be mapped to the master data plan and used to improve and refine the plan. Metadata should be expected to add value above and beyond the content it describes.