BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Stop Buying Storage. Start Managing Information

This article is more than 10 years old.

There is a point in every budgeting lifecycle where the CIO squirms when asked the question, “How much storage will we need going forward?”

At this point, the CIO has two options: 1) Make a guess or 2) Admit that the current information management systems can’t really provide a good answer to that question.

Rarely can a CIO actually answer this question based on a thorough understanding of how much data is being created, how much data must be kept in fast forms of storage, how rapidly each type of data must be archived, and the rate at which both fast and slower forms of storage are growing.

CEOs and CFOs should stop accepting incomplete or partially realized information management processes nd start insisting that these capabilities be improved so that data can be treated as an asset, not just as a liability, a center of value and profit, not a cost center.

What Does Bad Information Management Look Like?

Clear evidence of bad information management is the storage surprise. “Oops, we needed to buy a whole bunch of storage that was not planned for.” Another form of evidence is not having access to data that is just not that old. “Why do we only have one year’s worth of this data at the ready?” If you don’t know what data you have and cannot search your data in a convenient way, it is likely your information management capabilities need work.

Bad information management means none of these questions has a clear answer:

  • What different types of data are we storing?
  • What can each type of data tell us?
  • Which types create the most value?
  • What types of data are archived?
  • How do we know what is archived?
  • How can we restore data quickly?
  • What type of data restoration should be self service?
  • How can we support mobile use of content and data?
  • How can we move and replicate data to support different uses?

What Does Good Information Management Look Like?

With a good information management system, there is a clear answer to all the questions just posed, which also means there is a well-defined process for managing data as an asset.

But to really master your data, you need a system built to encompass a wide range of requirements in an integrated manner. Here are the capabilities that a top-notch information management system should provide.

Understanding data. Profiling data to understand what type it is and understanding the meaning of the data are key areas of differentiation for information management systems. The best systems can tell you a huge amount about what your data says based on trainable machine learning. To start managing data, it is vital to know what data you have.

Protecting and migrating data through its lifecycle. Perhaps the most fundamental information management task is making sure that data is protected and will not be lost due to a technology failure or catastrophic event. This protection must be applied to active data but also to data as it moves through various stages of archiving. This capability is vital to meet regulatory and legal requirements.

Scaling in an elegant way through the lifecycle. The amount of data in both databases and filesystems is growing in unpredictable ways. A good information management system will have a plan for handling the scale of all types of data as it moves through each stage of the lifecycle.

Managing and exploiting heterogeneous storage devices. When you buy a company and find a bunch of creaky old tape drives used for archiving, a good information management system will be able to incorporate them until they are ready for retirement. But if you find a state of the art SAN, then the system should be able to order up snapshots and track what is contained in each.

Removing duplication. It turns out that in almost all large repositories of information there is substantial duplication at many different levels. Physical storage requirements can be cut by 50 percent or more with a powerful deduplication technology.

Implementing sophisticated automation, policies, and workflows. A good information management system is able to determine the value of different types of information so that only important information is archived. Many classes of information such as operating system files should never become part of a company’s permanent archive.

Supporting self-service. As automation and transparency expand in information management, so should the possibility of self-service. For certain common events, such as losing a laptop or replacing a desktop, end-users should be able to restore data from backups.

Distributing and synchronizing content. Large repositories of data, in databases or in filesystems, are often needed in part on mobile devices or in the equivalent of data marts. An information management system should enable distribution of content and synchronization of changes.

Preserving security. Making a repository of information searchable and usable by a large group can be dangerous. It is crucial to preserve security and access control so that something that you cannot see when searching a filesystem is also unavailable when searching archives. Information management systems must preserve access control information so the end users can only see what they are entitled to.

Providing comprehensive search. If you put your data in an archive and cannot search to find what is where, it is very much like deleting the information. A good information management system allows active and archived data to be searched.

Supporting and integrating with applications. A good information management system must also be able to search and prepare data to be usable by a specific application. Depending on the application, this task may simply amount to finding and moving data, or in the case of applications such e-discovery, advanced concept recognition and categorization are required, in addition to capabilities such as placing a hold on information. Information management systems often differentiate through application-specific capabilities.

Providing analytics. A massive amount of value from an information management system comes from understanding who is using what information. A good information management system can provide analytics on both active and archived information.

This is a large set of capabilities with significant interactions and overlap. For example, to deliver comprehensive search you must preserve security and access control information in the archive. Even if you buy a system that has all of these capabilities, learning to use them is rarely something that can happen rapidly. A staged approach that moves through various levels of information management maturity is required.

In addition, every company will have a different set of priorities about various information management capabilities. The key question is: What information management capabilities are most urgent and how do we acquire them as part of a long-term roadmap for improvement so that data can become a valuable asset?

A Look at Three Approaches

Given that the information management challenge varies from business to business, it is impossible to provide a one-size-fits-all approach. Instead, we will take a look at three broad approaches to information management that are embedded in three leading products. While these three choices represent just a small sample of many that are available in the marketplace, each has its own sweet spot.

Intelligent Categorization: With deep roots in e-discovery, Recommind has expanded to provide broader information management and governance that can provide a crucial capability: knowing what you have. Recommind is able to categorize unstructured data using artificial intelligence techniques that are perfected through a process in which experts teach the algorithms how to categorize the data. Recommind’s e-discovery application uses this power to sift through huge troves of documents and find those that are relevant to answer various questions. But the same techniques can also be used to categorize and manage all of the unstructured data at a company. Recommind can implement a lifecycle for data based on various categories and other metadata and implement management of a lifecycle that moves the data from active repositories through archiving and deletion, all the while maintaining a hold on data deemed important for specific purposes. Recommind helps turn massive collections of unstructured data from a liability into an asset.

Data Defined Storage: In the data-centric approach implemented by Tarmin, data can be accessed through a Global Namespace that virtualizes standard protocols such as CIFS or NFS. It also provides access through common cloud interfaces such as S3. Tarmin provides a highly scalable system using an Enterprise Object Storage layer that creates virtualized storage pools. Each pool can utilize robust information management features such as policy-based data deduplication and compression, data retention, and multi-site data replication and migration, all with enterprise search and discovery.

This means that policies for archiving can be applied and data can be managed under the covers, without participation of the applications. This approach can be used to reduce the cost of storage, meet data governance regulations, and reduce risk through increased information management capabilities.

Tarmin’s Data Defined Storage approach, with its scale-out grid design and metadata indexing, supports the integration of big data analytics tools. The MetaBase (Distributed Metadata Repository) efficiently exposes content throughout the grid, allowing users to point analytics tools to data-in-place.

Tarmin is finding purchase where companies have a ballooning collection of unstructured data and are seeking a better way to manage and extract value from it while reducing costs.

Backup-powered Information Management: In this approach, which is implemented by CommVault, the entire lifecycle data is managed from creation through backup and eventually to archiving. The top-to-bottom approach throws a net over all storage devices, whether on-premise or in the cloud, and also creates a process that allows data in heterogeneous repositories to be managed and controlled through one system. Information management capabilities such as search are implemented across the lifecycle, from active data to various stages of archiving. In addition, the platform supports functions for various types of applications such as e-discovery, compliance, and for securely distributing content to mobile devices. The sweet spot for CommVault is companies who are seeking to create through a single platform a mature and fully functional information management capability that will remain resilient through rapid growth and acquisitions. In addition to supporting other specialized information management applications, CommVault is also using its platform to create versions of those applications as well. CommVault finds that it often lands at a company to solve a specific problem such as backup, and then gradually expands to implement a comprehensive information management platform.

The journey to a robust information management capability is never simple, but the rewards of making data an asset and not a liability are immense. If you don’t know how much storage you will need in the next year or cannot answer the questions listed above, it is clearly time to get started on the road to better information management.

If you found these ideas useful, let me know. I’m always looking for new perspectives.

Follow Dan Woods on Twitter:

Dan Woods is CTO and editor of CITO Research, a publication where early adopters find technology that matters. For more stories like this one visit www.CITOResearch.com. Dan has performed research for Recommind.