Data Governance & Data Quality

Data quality can be a real competitive advantage for companies that get it right. However, many data professionals who realise the long-term benefits of data accuracy still struggle to gain support for comprehensive and effective data quality programs.
Many organisations are facing unprecedented pressure in today’s Amazon-dominated world. The reality is that there is a “garbage in, garbage out” cycle of inaccuracies plaguing supply chain data, which can both create inefficiency and negatively impact the consumer experience. For example, one small measurement error can mean a shipment will not fit into the warehouse space assigned, causing a company to incur thousands of dollars in unnecessary costs. Additionally, a product ingredient missing from a product listing can cause an adverse reaction in a particularly vocal consumer using social media, leading to long-term damage to the brand’s reputation.

There are three pillars that each promote product information accuracy:

  • Data governance – By focusing on data governance to support the creation and maintenance of product data based on global standards, organisations can take one of the most important steps to setting up a culture that values data as a strategic asset. Data governance programs serve an important function within an enterprise: setting the parameters for data creation, management and usage, creating processes for resolving data issues, and enabling business users to make decisions based on high-quality data. A solid data governance program formalizes accountability for data management across the organization and ensures that the appropriate people are involved in the process.

 

  • Education and training protocol – Industries including grocery, retail, healthcare, and food-service leverage global GS1 standards in their supply chains to provide a common foundation for uniquely identifying products, capturing information about them, and sharing data with other companies. Adoption of these standards and best practices can help eliminate manual processes that are susceptible to error, enable better data interoperability with other organisations, and increase speed-to-market by making data more actionable. Maintaining internal knowledge about standards and proper application of them for data quality is essential for success.

 

  • Attribute audit – Attributes are the characteristics used to describe products, and they can play an essential role in how organisations stay vigilant about data quality. Organisations can validate data governance processes and institutional knowledge through routine physical audits that compare an actual product to the most recent information shared about that product.
Aside

Root Cause

Human error is never a root cause, but systems can always be improved upon and made to be more resilient.

When analysing an incident or problem, it can be tempting to use human error as a root cause. If we dig in deeper, though, what appears to be human error is caused by an underlying failure of process or environment. How can that be? Here are some possibilities:

– A fragile, poorly instrumented, or overly complex system can cause humans to make mistakes

– A process that doesn’t take into account human needs, such as sleep, context or skill can also cause humans to make mistakes

– A process of hiring and training operators may be broken, allowing the wrong operators into the environment.

Furthermore, “root cause” itself is a problematic statement, as there is rarely a single issue that leads to errors and incidents. Complex systems lead to complex failures, and adding humans into the mix complicates things further. Instead of thinking in terms of root cause, I suggest you consider a list of contributing factors, prioritised by risk and impact.

Aside

Why MTTR Over MTBF?

Being able to recover quickly from failure is more important than having failures less often. This is in part due to the increased complexity of failures today.

When you create a system that rarely breaks, you create a system that is inherently fragile. Will your team be ready to do repairs when the system does fail? Will it even know what to do? Systems that have frequent failures that are controlled and mitigated such that their impact is negligible have teams that know what to do when things go sideways. Processes are well documented and honed, and automated remediation becomes actually useful rather than hiding in the dark corners of your system.

While I’m definitely not saying failure should be an acceptable condition, I’m positing that since failure will happen, it’s just as important (or in some cases more important) to spend time and energy on your response to failure rather than trying to prevent it.

Aside

The Dance Floor and the Balcony

Ronald Heifetz is the King Hussein bin Talal Senior Lecturer in Public Leadership at Harvard University’s John F. Kennedy School of Government. For the past twenty years, he has generated critical works that have influenced leadership theory in every domain. Heifetz often draws on the metaphor of the dance floor and the balcony.

Let’s say you are dancing in a big ballroom. . . . Most of your attention focuses on your dance partner, and you reserve whatever is left to make sure you don’t collide with dancers close by. . . . When someone asks you later about the dance, you exclaim, “The band played great, and the place surged with dancers.”

But, if you had gone up to the balcony and looked down on the dance floor, you might have seen a very different picture. You would have noticed all sorts of patterns. . . you might have noticed that when slow music played, only some people danced; when the tempo increased, others stepped onto the floor; and some people never seemed to dance at all. . . . the dancers all clustered at one end of the floor, as far away from the band as possible. . . . You might have reported that participation was sporadic, the band played too loud, and you only danced to fast music.

. . .The only way you can gain both a clearer view of reality and some perspective on the bigger picture is by distancing yourself from the fray. . . .

If you want to affect what is happening, you must return to the dance floor.*

So you need to be both among the dancers and up on the balcony. That’s where the magic is, going back and forth between the two, using one to leverage the other.

_______

* Heifetz, R., and Linsky, M. Leadership on the Line: Staying Alive Through the Dangers of Leading.Boston: Harvard Business School Press, 2002.

Aside

Test Your Changes

Following on from my previous post on there’s no such thing as a small change…

Please do not make any changes to a production system – a live system – without first testing for any side effects. For example, please do not read a blog post or a book chapter, and then check your system and find you are using manual memory management – and then just turn on automatic memory management. Query plans may change and performance may be impacted. One of three things could happen:

  • Things run exactly the same
  • Things run better than they did before
  • Things run much worse than they did before

Exercise caution before making changes; test the proposed change first!

Querying the alert log via SQL

Quick tip regarding the Oracle database alert log (from 11g onwards). There is a fixed table X$DBGALERTEXT:


SQL> select message_text from X$DBGALERTEXT where rownum <= 30;

MESSAGE_TEXT
-----------------------------------------------------------------------------------------------------------------------------------------
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Initial number of CPU is 2
Number of processor cores in the system is 2
Number of processor sockets in the system is 1
Shared memory segment for instance monitoring created
CELL communication is configured to use 0 interface(s):
CELL IP affinity details:
NUMA status: non-NUMA system
cellaffinity.ora status: N/A
CELL communication will use 1 IP group(s):
Grp 0:
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST
Autotune of undo retention is turned on.
IMODE=BR
ILAT =27
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options.

ORACLE_HOME = /u01/app/oracle/product/11.2.0/orcl
System name:Linux
Node name:ODIGettingStarted
Release:2.6.39-400.17.1.el6uek.x86_64
Version:#1 SMP Fri Feb 22 18:16:18 PST 2013
Machine:x86_64
Using parameter settings in client-side pfile /u01/app/oracle/admin/orcl/pfile/init.ora on machine ODIGettingStarted
System parameters with non-default values:

30 rows selected.

My personal opinion? This can be useful if you're looking to create some custom alert log monitoring. However I still prefer to  monitor my alert logs using shell scripts since accessing this X$ table requires the instance to be up and operational. But if you don't have access to the OS then this could be useful.

I also found the following Metalink note:
High CPU for Queries on X$DBGALERTEXT (Doc ID 2056666.1)

APPLIES TO:

Oracle Database – Enterprise Edition – Version 11.2.0.1 and later
Information in this document applies to any platform.

SYMPTOMS

  • Query on X$DBGALERTEXT consumes high CPU taking a long time to complete.For example:
SELECT count(*)
FROM X$DBGALERTEXT
WHERE to_date(to_char(originating_timestamp, ‘dd-mon-yyyy hh24:mi’), ‘dd-mon-yyyy hh24:mi’) > to_date(to_char(systimestamp – .00694, ‘dd-mon-yyyy hh24:mi’), ‘dd-mon-yyyy hh24:mi’) /* last 10 minutes */
AND (
message_text = ‘ORA-00600’
OR message_text LIKE ‘útal%’
OR message_text LIKE ‘%error%’
OR message_text LIKE ‘%ORA-%’
OR message_text LIKE ‘%terminating the instance%’
);
  • It can also cause ORA-700 [dbgrfafr_1].