Blog

Data Quality and the Importance of Data Stewardship

Having understandable, clean, and compliant data is a necessity for business success. With the proliferation of Big Data, highly-distributed systems, and an ever-changing sea of regulations, companies are operating in a world of increasing digital complexity and growing data supplies. Unfortunately, more data doesn’t imply higher quality or less need for expertise - often, quite the opposite is true.

Specific care is needed to ensure that analyses made on the basis of data are reliable and offer value to an organization. It is incredibly easy for a simple oversight to completely invalidate a lot of hard work.

For example, one may forget to remove test accounts, fail to consider crawlers’ impact on the number of visits to a website, miss errors from data import and export, or forget to pay attention to details such as different time zones and currencies. It can even happen that data that look perfect on the outset don’t actually contain the expected information because of inconsistencies with naming and documentation.

Each of these issues is manageable to fix, but without conscious attention can cause misleading conclusions and wasted resources.

In this context, the role of a Data Steward is becoming ever-more valuable. Data Stewards have a combination of data science and data management knowledge, and are responsible for everything related to data collection, maintenance, organization, and compliance. They may handle tasks ranging from creating data policies and guidelines to managing the data architecture and ensuring data collection and usage is compliant with regulations. Essentially, they are an organization’s “go-to” for all things data-related.

Beyond simply being an important resource, Data Stewards can offer value to an organization in a variety of other ways:

  • In companies with less-defined hierarchies and roles, Data Stewards can create rules and accountability around data.
  • Data Stewards can help to build a data-focused culture, encouraging the use of high-quality data across the organization, and increasing awareness about data’s strategic role in organizational success.
  • Clear and consistent guidelines can facilitate interdepartmental communication. By ensuring that databases are complete and dependable, organizations can avoid confusion and miscommunication, and focus priorities more accurately.
  • High quality is essential for making trustworthy analyses on the basis of data. Data Stewards can help a data science team efficiently create reliable insights for the business.
  • Dealing with customer data is a matter of trust, and with that comes very high stakes. Data Stewards reduce the likelihood for errors, which can impact both internal systems and a company’s reputation if data protection mistakes occur. Beyond avoiding financial losses and damaged customer trust, data protection compliance is critical in the era of the GDPR and other modern data protection requirements, and is much easier to efficiently attain with dedicated expert knowledge and attention.

Data Steward roles take many different forms. Some may have special domain knowledge, or may be focused on a specific aspect of the organization (such as a particular business function, project, or system).

In general, though, the tasks of a Data Steward may include:

  • Maintaining data: Data maintenance takes place throughout the lifecycle of the data. Here, a Data Steward may identify new sources of data, fix integration problems, develop quality control processes, analyze abnormalities, and implement corrections.
  • Establishing rules: An important aspect of data stewardship is the creation and enforcement of internal guidelines around data usage to ensure not only internal efficiency but also compliance with regulatory obligations. Internal rules may cover aspects such as data processing, storage, to whom and how data can be sent, proper documentation, and consistent terminology. Externally, these rules ensure that all privacy, retention, archival, and disposal requirements are met, in line with local and international regulations.
  • Improving with data: Data Stewards may monitor data usage, share best practices, and assist teams in making better use of the available data.
  • Facilitating communication: One of the key functions of Data Stewards is to enhance communication between departments, particularly between business and IT, and to promote integration. Data Stewards are also an important resource for staff at all levels to get information about their data - such as what information is available and why, where the data are located, what the data mean, whether they are trustworthy, and any potential restrictions around the use of the data.

Data stewardship is not only technical, but can play an important role in advancing an organization’s goals by effectively leveraging data.

Despite the clear value-add that Data Stewards can offer, positions are still relatively rare. As of August 2019, a search for the term on Indeed.com returns only 29 results in Germany, in comparison to more well-known positions such as Data Scientist (with 1,161 listings) or Data Analyst (with 2,365 listings). Even in a major tech hub like San Francisco, Data Steward-related positions number only 213, versus 2,558 Data Scientist and 3,942 Data Analyst listings.

While having a dedicated person or team certainly has advantages, Data Stewards don’t necessarily need to be a single individual responsible for every aspect of data quality and maintenance. Rather, data stewardship can take the form of distributed responsibilities throughout an organization or data science department. With the proper organizational culture, the principles of data stewardship also can - and should - be upheld company-wide.

The key is to recognize the critical nature of high-quality data, and the need to actively pursue and prioritize maintaining high standards - not only to avoid the potential negative costs from faulty or mishandled data, but also as a key way to ensure organizational competitiveness and long-term success in a data-based world.