In today’s data-centric world where data volumes continue to grow at a staggering rate, organisations are looking to capture and draw knowledge at a faster rate than ever before. However, what were previously highly capable stores of this data may now be buckling under the pressure.

Many organisations are starting to find that it doesn’t take much for their highly efficient, scalable database to transform into a complex, unwieldy and unstable platform. Industry experts even outside the four walls of our organisation are debating ways of stopping this from happening.

I have worked alongside many organisations that have struggled to keep pace with this perpetual demand for faster and more powerful data-processing capabilities. In this article I share my tips to ensure your database continues to run as effectively as possible.

For starters, beyond the storage requirements for such databases, it is important to consider that large-scale processing will be required given the complexity of analytics operations on data sets of this size. Effectively, this creates an exponential need for additional computing capacity as big data continues to swell.

Having reviewed many different platforms on an annual basis, we are beginning to notice a series of trends between the different platforms. We base our reviews on every aspect of the infrastructure, from hardware, to operating systems set up, right through to the architecture of the database, all the while taking note of any part of the infrastructure which could lead to the decrease in performance of the database overall.

Given how quickly data volumes have grown and how data usage has changed in recent years, it’s not hard for our databases to become congested, cluttered, inefficient and corrupt. The good news is that simple fixes can do wonders for your performance, so getting back on track to good data health can be just as easy.

To date, the most common issue comes in the form of a change in user patterns, organisations want different things out of their data, and they want the data to be accessible by more users for longer periods of time. Component failure, network outages and the entire pantheon of other all-too-common IT disasters are potential stumbling blocks for a system designed to cope with big data.

High levels of redundancy and strong failover capability are essential elements of a big data framework. Just like we have seen BYOD and the introduction of tablets happening at a surprisingly fast rate, we are seeing data volumes and access requirements going through the roof – which brings the best planned databases quickly out of balance.

In some cases, it’s a commercial software package that causes the hiccups. Over the past year, we worked with a number of health care providers that were using a commercial package for managing medical practices, offices and small clinics. This particular package embodies a database that, if not well maintained, over time tends to develop problems with performance and data integrity.

This happens slowly in the background as the practice grows, specifically in situations where users fail to conduct proactive maintenance or upgrades. Users were reporting such problems as data loss, data corruption, and recovery issues. In some cases, it’s unclear if these problems were a result of intrusions that were allowed to happen because the software wasn’t upgraded regularly.

Below is a list of checkpoints I would recommend any data professional to keep an eye on:

  • Memory Configuration: The wrong memory configuration can have drastic effects on the performance of a server.
  • Operating System Parameters: Theses need to be set optimally to support new releases and changing usage profiles.
  • Partitioning: Data Loads change over time as an application is used and outdated partitioning strategies that may have been terrific for the data load you had 18 months ago may no longer support the data load you have today. It’s best to check the partitioning strategy in place is still accurate for the needs of the task.
  • SQL Queries and Indexes: Poorly written SQL queries and misaligned indexes on commonly executed tasks can have significant impact. A thorough analysis of where queries perform sequential scans, have sub-optimal execution plans or could be supported with better index strategies often works wonders.
  • Misaligned Java Queries: Often queries are generated by the Java code, but the queries may not be well supported by the table structures, the partitions or the indexes.

Time after time we have seen Data Professionals make a selection or all of the above adjustments and we have always seen a positive outcome.

While all of this will help, by far the worst scenarios result from deficient backup and recovery strategies. Therefore it is essential that backup strategies plan for all eventualities including operator error, data corruption or a complete system failure. By addressing backup and recovery as well as the above areas, it has been proven that performance can be improved by as much as 1,000 times. Therefore as data volumes continue to grow, make sure your database doesn’t become congested, cluttered, inefficient or corrupt by carrying out these simple fixes. It really is that easy to get back on track to good data health.