Database Design and Tuning

PHYSICAL DATABASE DESIGN IN RELATIONAL DATABASES
Physical design: is an activity where the goal is to design the appropriate structuring of data in storage.

It is not possible to make meaningful physical design decisions and performance analyses until we know the queries, transactions, and applications that are expected to run on the database in a high-level form.

Factors That Influence Physical Database Design

  • Analyzing the Database Queries and Transactions
    For each query, we should specify the following:
    • The tables that will be accessed by the query.
    • The attributes on which any selection conditions for the query are specified.
    • The attributes on which any join conditions or conditions to link multiple tables or objects for the query are specified.
    • The attributes whose values will be retrieved by the query.
  • Analyzing the Expected Frequency of Invocation of Queries and Transactions
    • Expected rates of invocation with attributes in each file as a selection attribute or a join attribute, over all the queries and transactions.
    • Generally “80-20 rule” applies. It is sufficient to determine the 20 percent or so most important ones that may use 80% of processing.
  • Analyzing the Time Constraints of Queries and Transactions
    • May have stringent performance constraints.
    • The selection attributes used by queries and transactions with time constraints –> higher-priority candidates for primary access structures.
  • Analyzing the Expected Frequencies of Update Operations
    • A minimum number of access paths (e.g. index) should be specified for a file that is updated frequently, because updating the access paths themselves slows down the update operations.
  • Analyzing the Uniqueness Constraints on Attributes
    • Access paths should be specified on all candidate key attributes-or sets of attributes-that are either the primary key or constrained to be unique.

Physical Database Design Decisions

  • Whether to index an attribute:
    The attribute must be a key, or there must be some query that uses that attribute either in a selection condition (equality or range of values) or in a join.

    One factor in favour of setting up many indexes is that some queries can be processed by just scanning the indexes without retrieving any data.
  • What attribute(s) to index on:
    An index can be constructed on one or multiple attributes.
    For example, (garment_style_#, color) in a garment inventory database, a multiattribute index is warranted.

    The ordering of attributes within a multiattribute index must correspond to the queries. For example, the above index assumes that queries would be based on an ordering of colors within a garment_style_# rather than vice versa.
  • Whether to set up a clustered index:

A clustered index defines the order in which data is physically stored in a table. Table data can be sorted in only way, therefore, there can be only one clustered index per table.

Please note: There is no such thing as create clustered index in Oracle. The Oracle database calls this concept index-organized tables (IOT), other databases use the term clustered index.

If a table requires several indexes, the decision about which one should be a clustered index is needed.

– Range queries benefit a great deal from clustering.
– If several attributes require range queries, relative benefits must be evaluated which attribute to cluster on.
– If a query is to be answered by doing an index search only (without retrieving data records), the corresponding index should not be clustered, since the main benefit of clustering is achieved when retrieving the records themselves.

  • Denormalization as a Design Decision for Speeding Up Queries

The normalization sometimes is sacrificed in favour of faster execution of frequently occurring queries and transactions.

This process of storing the logical database design (which may be in BCNF or 4NF) in a weaker normal form, say 2NF or 1NF, is called denormalization.

Typically, the designer adds attributes to a table that are needed for answering queries or producing reports so that a join with another table, which contains the newly added attribute, is avoided.

This reintroduces a partial functional dependency or a transitive dependency into the table, thereby creating the associated redundancy problems.

Other forms of denormalization consist of storing extra tables to maintain original functional dependencies that are lost during a BCNF decomposition.


DATABASE TUNING IN RELATIONAL SYSTEMS

  • Inputs to the tuning process include (internally collected in DBMS):
    • Sizes of individual tables.
    • Number of distinct values in a column.
    • The number of times a particular query or transaction is submitted/executed in an interval of time.
    • The times required for different phases of query and transaction processing (for a given set of queries or transactions).
    • Storage/tablespace statistics
    • I/O and device performance statistics
    • Query/transaction processing statistics
      • Execution times of queries and transactions, optimization times during query optimization.
    • Locking/redo logging related statistics
      • Rates of issuing different types of locks, transaction throughput rates, and redo log records activity.
    • Index statistics
      • Number of levels in an index, number of non-contiguous leaf pages, etc
  • Tuning Indexes

The initial choice of indexes may have to be revised for the following reasons:

– Certain queries may take too long to run for lack of an index.
– Certain indexes may not get utilized at all.
– Certain indexes may be causing excessive overhead because the index is on an attribute that undergoes frequent changes.

  • Tuning the Database Design

The database design has to be driven by the processing requirements as much as by data requirements. Dynamically changed processing requirements need to be addressed by making changes to the conceptual schema if necessary and to reflect those changes into the logical schema and physical design.

We may revert to the logical database design, make adjustments to the logical schema, and remap it to a new set of physical tables and indexes.

e.g. Existing tables may be joined (denormalized) because certain attributes from two or more tables are frequently needed together:
This reduces the normalization level from BCNF to 3NF, 2NF, or 1NF

  • Tuning Queries

Mainly two indications to suggest that query tuning may be needed:

1. A query issues too many disk accesses (for example, an exact match query scans an entire table).

2. The query plan shows that relevant indexes are not being used.

Many query optimizers do not use indexes in the presence of:
– arithmetic expressions (such as SALARY/365 > 10.50);
– numerical comparisons of attributes of different sizes and precision (such as AQTY = BQTY where AQTY is of type INTEGER and BQTY is of type SMALLINTEGER);
– NULL comparisons (such as BDATE IS NULL), and
– substring comparisons (such as LNAME LIKE “%MANN”).

Some DISTINCTS may be redundant and can be avoided without changing the result (the result is distinct). A DISTINCT often causes a sort operation and must be avoided as far as possible.

If multiple options for join condition are possible, choose one that uses a clustering index and avoid those that contain string comparisons.

One idiosyncrasy with query optimizers is that the order of tables in the FROM clause may affect the join processing. One may have to switch this order so that the smaller of the two relations is scanned and the larger relation is used with an appropriate index.

A query with multiple selection conditions that are connected via OR may not be prompting the query optimizer to use any index. It may be split up and expressed as a union of queries.

For example:
SELECT FNAME, LNAME, SALARY, AGE
FROM EMPLOYEE
WHERE AGE> 45 OR SALARY < 50000;

Splitting previous query up as:
SELECT FNAME, LNAME, SALARY, AGE
FROM EMPLOYEE
WHERE AGE> 45
UNION
SELECT FNAME, LNAME, SALARY, AGE
FROM EMPLOYEE
WHERE SALARY < 50000;

may utilize indexes on AGE as well as on SALARY.

WHERE conditions may be rewritten to utilize the indexes on multiple columns. For example,

SELECT REGION#, PROD_TYPE, MONTH, SALES
FROM SALES_STATISTICS
WHERE REGION# = 3 AND
((PRODUCT_TYPE BETWEEN 1 AND 3) OR
(PRODUCTTYPE BETWEEN 8 AND 10));

May use an index only on REGION# and search through all leaf pages of the index for a match on PRODUCT_TYPE:

SELECT REGION#, PROD_TYPE, MONTH, SALES
FROM SALES_STATISTICS
WHERE
(REGION# = 3 AND (PRODUCT_TYPE BETWEEN 1 AND 3)) OR
(REGION# = 3 AND (PRODUCT_TYPE BETWEEN 8 AND 10));

May use a composite index on (REGION#, PRODUCT_TYPE) and work much more efficiently.

First Steps When Performance Tuning

A well-planned methodology is the key to success in performance tuning. Without a proper diagnosis of the issue any troubleshooting step taken will further delay the resolution. Example, a DBA can not just blindly start tracing/tuning a SQL/Sessions when all symptoms on careful analysis points towards issues with Disk/Memory/Network etc.

Also remember that most common performance gains are attained through SQL/Code/application tuning and increasing hardware resources should always be your last resort. Developers play an important role in writing applications with effective SQL statements so it is important that the DBA and Developers should work together when facing a database/application issue.

When performance issues are reported to a DBA, the first step is to get as much information as possible that is related to the issue. You can ask below example preliminary questions to users/developer to collect the first level of information..

  1. What operations/program are executed?
  2. Is it Oracle seeded or custom program?
  3. How much time it used to take earlier?
  4. Is the run time increased over time or you are seeing sudden increase in run time?
  5. Was there any recent code change/migration?
  6. Is it always slow or for certain time of the day only?
  7. Is it slow for all parameters or for some specific parameters?
  8. How much data is getting processed?
  9. What is the frequency of job execution? Was there any change in frequency?
  10. Does the problem happens on both their test and production systems?

Asking above kind of questions will help you in deciding what part of system you should target.

  • Target the whole system
  • Target a single session
  • Target a single SQL statement

Your goal should be to answer below three questions:

Where is time spent?
You need to identify where the time is getting consumed. Is it the code or network latency or disk I/O bottlenecks or the CPU/Memory crunch

How is time spent?
You need to find out how the time is spent and how much time is spend in each layer – database, application, network, disk etc .

How to reduce the time spent?
Finally, based on above information see where the major chunk of time is spent and how to reduce it.

High level steps for System level performance issues:

Use vmstat top/prstat to identify system wide CPU/Memory consumption.
Use iostat to verify if disks are the bottleneck
Use netstat/tnsping etc to verify if network is issue.
Verify if any other resource intensive processes are running on server.
Verify filesystem space.
Check alert logs, application logs, traces etc.
Check database locks
Generate AWR reports to see what is eating up resources.
Check if increasing application/database memory/redo/undo/temp/SGA will help.

High level steps for Session level performance issues:

Find the Top SQLs executing under session.
Apply SQL optimization techniques on top SQLs.
Verify locking at session level
Generating AWR/ASH for that duration may help in providing useful information.

High level steps for SQL level performance issues:

Apply trace, Generate TKPROF file and analyze for top resource consuming SQL.
Avoid full table scans on large tables
Possible indexes on columns contained in the WHERE clause
Use AWR/ASH reports to get collective information
Also use SQLTRPT/SQLT
Verify is statistics are current
Verify if indexes are valid and are adequate.
Verify is parallelism/materialized views/Baselines/SQL Profiles will help
Monitor V$SESSION_LONGOPS to detect long running operations
Decide on using Hints if those are helping.
Table partitioning can be thought of as an option based on kind and size of tables.

Perspective on SQL Tuning

Fire up your favorite search engine, enter “SQL tuning” as the search term, and you are likely to detect a common theme in the results. They typically relate to modification of the database structure, such as adding or removing indexes; modification of the execution environment, such as gathering or modifying optimizer statistics; or more-substantial modifications such as rewriting the SQL or changing the physical design of the database. What is often missing in those same search engine results is a warning that any such modifications carry an inherent risk to the entire database environment in which the SQL statement is running. The larger the modification, the higher the risk.

As Oracle Database has evolved, the number of tuning tools has grown and the level of sophistication of each of these tools has also increased, giving developers and DBAs a plethora of options to explore. One consequence is that it is easy to dive straight into utilizing these tuning tools without stepping back and asking a few key questions about the SQL statement first:

  • What is the business functional requirement being served by the SQL?
  • Is the SQL correct?
  • Can this SQL ever run fast enough to meet the requirements?

Even without diving into low-level tools, it is easy to forget these questions. I’m frequently in meetings to assist with performance tuning issues, and upon my arrival, often the first thing presented to me is a single SQL statement, with no context or explanation surrounding it, and this plea: “This is the problem! Please solve it.” It may seem counterintuitive, but the first step of SQL tuning is to forget about the SQL.

The Business Requirement

No organization I’m aware of has ever had a business model of “Let’s make sure SQL runs fast,” unless that business was a SQL tuning consultancy! SQL statements, the applications that run them, and the IT departments that build and support those applications exist to meet a suite of business functional requirements. Those requirements may be created by the business as part of its desire to thrive commercially, or they may be imposed on the business by regulatory bodies. In either case, satisfying business requirements must be kept uppermost in a developer’s mindset when it comes to tuning SQL, because it drives the very decisions made by developers in creating the database design and the SQL that runs on it. Ignoring the underlying business requirements is a catalyst for poorly performing SQL.

SQL Correctness

Once the business functional requirement has been confirmed, you can turn your attention to the SQL statement. Just because a SQL statement returns without error, or even if the statement returns the correct results, that does not constitute a guarantee that the SQL statement is actually correct. Often the cause of poorly performing SQL is malformed construction, which can easily slip through testing cycles if the query results are still plausible (especially with small test datasets). Hence, when I’m asked to tune a SQL statement, I will spend a few minutes before tackling any performance-related avenues making sure that the SQL statement does not have any obvious syntax errors. Here are some of the common errors I see that typically cause SQL statements to be misdiagnosed as performance problems.

Summary

For IT professionals, there is always the temptation of “Let’s just jump into the code,” no matter what language the code is written in, and SQL code is no exception. And when it comes to the tuning of SQL, that temptation is even greater, considering that Oracle Database offers such a plethora of SQL tuning facilities. Throughout the various versions of Oracle Database, technologies such as SQL profiles, stored outlines, SQL plan baselines, tuning advisors, and optimizer hints all provide enormous opportunities for developers to improve SQL code performance.

Data Governance & Data Quality

Data quality can be a real competitive advantage for companies that get it right. However, many data professionals who realise the long-term benefits of data accuracy still struggle to gain support for comprehensive and effective data quality programs.
Many organisations are facing unprecedented pressure in today’s Amazon-dominated world. The reality is that there is a “garbage in, garbage out” cycle of inaccuracies plaguing supply chain data, which can both create inefficiency and negatively impact the consumer experience. For example, one small measurement error can mean a shipment will not fit into the warehouse space assigned, causing a company to incur thousands of dollars in unnecessary costs. Additionally, a product ingredient missing from a product listing can cause an adverse reaction in a particularly vocal consumer using social media, leading to long-term damage to the brand’s reputation.

There are three pillars that each promote product information accuracy:

  • Data governance – By focusing on data governance to support the creation and maintenance of product data based on global standards, organisations can take one of the most important steps to setting up a culture that values data as a strategic asset. Data governance programs serve an important function within an enterprise: setting the parameters for data creation, management and usage, creating processes for resolving data issues, and enabling business users to make decisions based on high-quality data. A solid data governance program formalizes accountability for data management across the organization and ensures that the appropriate people are involved in the process.

 

  • Education and training protocol – Industries including grocery, retail, healthcare, and food-service leverage global GS1 standards in their supply chains to provide a common foundation for uniquely identifying products, capturing information about them, and sharing data with other companies. Adoption of these standards and best practices can help eliminate manual processes that are susceptible to error, enable better data interoperability with other organisations, and increase speed-to-market by making data more actionable. Maintaining internal knowledge about standards and proper application of them for data quality is essential for success.

 

  • Attribute audit – Attributes are the characteristics used to describe products, and they can play an essential role in how organisations stay vigilant about data quality. Organisations can validate data governance processes and institutional knowledge through routine physical audits that compare an actual product to the most recent information shared about that product.
Aside

Root Cause

Human error is never a root cause, but systems can always be improved upon and made to be more resilient.

When analysing an incident or problem, it can be tempting to use human error as a root cause. If we dig in deeper, though, what appears to be human error is caused by an underlying failure of process or environment. How can that be? Here are some possibilities:

– A fragile, poorly instrumented, or overly complex system can cause humans to make mistakes

– A process that doesn’t take into account human needs, such as sleep, context or skill can also cause humans to make mistakes

– A process of hiring and training operators may be broken, allowing the wrong operators into the environment.

Furthermore, “root cause” itself is a problematic statement, as there is rarely a single issue that leads to errors and incidents. Complex systems lead to complex failures, and adding humans into the mix complicates things further. Instead of thinking in terms of root cause, I suggest you consider a list of contributing factors, prioritised by risk and impact.

Aside

Why MTTR Over MTBF?

Being able to recover quickly from failure is more important than having failures less often. This is in part due to the increased complexity of failures today.

When you create a system that rarely breaks, you create a system that is inherently fragile. Will your team be ready to do repairs when the system does fail? Will it even know what to do? Systems that have frequent failures that are controlled and mitigated such that their impact is negligible have teams that know what to do when things go sideways. Processes are well documented and honed, and automated remediation becomes actually useful rather than hiding in the dark corners of your system.

While I’m definitely not saying failure should be an acceptable condition, I’m positing that since failure will happen, it’s just as important (or in some cases more important) to spend time and energy on your response to failure rather than trying to prevent it.

Aside

The Dance Floor and the Balcony

Ronald Heifetz is the King Hussein bin Talal Senior Lecturer in Public Leadership at Harvard University’s John F. Kennedy School of Government. For the past twenty years, he has generated critical works that have influenced leadership theory in every domain. Heifetz often draws on the metaphor of the dance floor and the balcony.

Let’s say you are dancing in a big ballroom. . . . Most of your attention focuses on your dance partner, and you reserve whatever is left to make sure you don’t collide with dancers close by. . . . When someone asks you later about the dance, you exclaim, “The band played great, and the place surged with dancers.”

But, if you had gone up to the balcony and looked down on the dance floor, you might have seen a very different picture. You would have noticed all sorts of patterns. . . you might have noticed that when slow music played, only some people danced; when the tempo increased, others stepped onto the floor; and some people never seemed to dance at all. . . . the dancers all clustered at one end of the floor, as far away from the band as possible. . . . You might have reported that participation was sporadic, the band played too loud, and you only danced to fast music.

. . .The only way you can gain both a clearer view of reality and some perspective on the bigger picture is by distancing yourself from the fray. . . .

If you want to affect what is happening, you must return to the dance floor.*

So you need to be both among the dancers and up on the balcony. That’s where the magic is, going back and forth between the two, using one to leverage the other.

_______

* Heifetz, R., and Linsky, M. Leadership on the Line: Staying Alive Through the Dangers of Leading.Boston: Harvard Business School Press, 2002.

Aside

Test Your Changes

Following on from my previous post on there’s no such thing as a small change…

Please do not make any changes to a production system – a live system – without first testing for any side effects. For example, please do not read a blog post or a book chapter, and then check your system and find you are using manual memory management – and then just turn on automatic memory management. Query plans may change and performance may be impacted. One of three things could happen:

  • Things run exactly the same
  • Things run better than they did before
  • Things run much worse than they did before

Exercise caution before making changes; test the proposed change first!

There is no such thing as a small change

“We want to limit the length of a review in the product to 140 characters, because we may want to use SMS at some stage. That’s a small change, right?”

Wrong.

There are no small changes when you’re committed to delivering quality software. Let’s look at the above case. A naïve programmer may well get this coded in three minutes—after all it’s just an if-statement.

A background in consulting, where you are paid for your time, teaches you to ask a few questions before proceeding with ‘small changes’. Let’s start with some easy questions.

What happens when the review is above 140 characters? Do we crop the string, or display an error message to the user? If we display an error, where does it appear? What does it say? Who is going to write the error message? How do we explain to the user why we’re limiting them to 140 characters? How will these errors look? Do we have a style defined? If not, who is designing it?

But wait, there’s more…

In the unlikely event that we have answers to hand for all of the above concerns, we’re still not finished. Just doing this server-side is a messy way to handle an error. We should do this client-side. But if we’re going to do client-side validation then I’d have a few more questions…

Who’s writing the JavaScript? Does the JavaScript display the same type of error as the server-side code? If not, what’s the new style? How does it behave without JavaScript? How do we ensure that any new update to the 140 character requirement affect both client-side and server-side validation?

We’re still not done. Look at this from a users point of view. They’re already frustrated by having to limit a review to 140 characters for a bizarre reason they won’t understand, and now we’re asking them to guess how long their message is? There must be a better way. Let’s give them a character counter. Oh, well that raises a few more questions…

Nearly there…

Who is going to write this character counter? If we’re using one we found on the net, then who wants to test it in our target browsers (i.e. not just Chrome 27 and beyond).

Also, where is the count of letters displayed on the screen? What does the count look like? Of course, the style should change as the user approaches zero characters, and should definitely look erroneous when they’ve used more than 140 characters—or should it stop accepting input at that point? If so, what happens when they paste something in? Should we let them edit it down, or alert them?

When we’ve implemented the character counter, styled all the errors, implemented the server-side validations, and checked it in all of our supported browsers then it’s just a case of writing tests for it and then deploying it. Assuming your time to production is solid, this bit will be straightforward.

All of this happily ignores the fact that users will wonder why someone wrote an eighty word review just before them and now they’re only allowed write a 140 character one. Obviously we’ll need to keep support in the loop on this, and update our documentation, API, iPhone, and Android apps. Also, what do we do with all the previous reviews? Should we crop them, or leave them as is?

Don’t get me started on how we’re gonna deal with all the funky characters that people use these days… good luck sending them in a text message. We’ll probably need to sanitize the input string of rogue characters, and this means new error messages, new server-side code… the list goes on.

Once you get through all of this you will have your feature in place, and this is just for a character count. Now try something that’s more complex than an if-statement. There are no tiny features when you’re doing things properly. This is why as a UX designer you need a good understanding of what it takes to implement a feature before you nod your head and write another bullet point.

You can’t be serious…

Yes, this was a rant. Yes, most of the above decisions can be made on the fly by experienced developers, but not all of them. Yes, you can use maxlength, but this only addresses one of the points above, and even then only in an HTML5 context.

Often what seems like a two minute job can often turn into a two hour job when the bigger picture isn’t considered. Features that seemed like ‘good value’ at a two minute estimate are rightfully out of scope at two hours.

Key point: Scope grows in minutes, not months. Look after the minutes, and the months take care of themselves.

Agreeing to features is deceptively easy. Coding them rarely is. Maintaining them can be a nightmare. When you’re striving for quality, there are no small changes.

Red means stop, green means go… snow means…?

One more example to remind you that there is no such thing as a “small” change.

LED lights are an excellent lighting solution due to their longevity and power efficiency. Replace all traffic lights with LED lights, “small” change right? Wrong.  It turns out that they may not be the best choice in all conditions. Normally, the excess heat generated by incandescent bulbs is enough to melt the snow off lights so that they remain visible even in freezing conditions. Traffic lights that employ LED lighting, while far more power efficient and reliable than older ones, aren’t able to melt the snow that accumulates.

Snow blocking traffic signals is a significant problem as it has already led to dozens of accidents and at least one fatality.