Database Design and Tuning

PHYSICAL DATABASE DESIGN IN RELATIONAL DATABASES
Physical design: is an activity where the goal is to design the appropriate structuring of data in storage.

It is not possible to make meaningful physical design decisions and performance analyses until we know the queries, transactions, and applications that are expected to run on the database in a high-level form.

Factors That Influence Physical Database Design

  • Analyzing the Database Queries and Transactions
    For each query, we should specify the following:
    • The tables that will be accessed by the query.
    • The attributes on which any selection conditions for the query are specified.
    • The attributes on which any join conditions or conditions to link multiple tables or objects for the query are specified.
    • The attributes whose values will be retrieved by the query.
  • Analyzing the Expected Frequency of Invocation of Queries and Transactions
    • Expected rates of invocation with attributes in each file as a selection attribute or a join attribute, over all the queries and transactions.
    • Generally “80-20 rule” applies. It is sufficient to determine the 20 percent or so most important ones that may use 80% of processing.
  • Analyzing the Time Constraints of Queries and Transactions
    • May have stringent performance constraints.
    • The selection attributes used by queries and transactions with time constraints –> higher-priority candidates for primary access structures.
  • Analyzing the Expected Frequencies of Update Operations
    • A minimum number of access paths (e.g. index) should be specified for a file that is updated frequently, because updating the access paths themselves slows down the update operations.
  • Analyzing the Uniqueness Constraints on Attributes
    • Access paths should be specified on all candidate key attributes-or sets of attributes-that are either the primary key or constrained to be unique.

Physical Database Design Decisions

  • Whether to index an attribute:
    The attribute must be a key, or there must be some query that uses that attribute either in a selection condition (equality or range of values) or in a join.

    One factor in favour of setting up many indexes is that some queries can be processed by just scanning the indexes without retrieving any data.
  • What attribute(s) to index on:
    An index can be constructed on one or multiple attributes.
    For example, (garment_style_#, color) in a garment inventory database, a multiattribute index is warranted.

    The ordering of attributes within a multiattribute index must correspond to the queries. For example, the above index assumes that queries would be based on an ordering of colors within a garment_style_# rather than vice versa.
  • Whether to set up a clustered index:

A clustered index defines the order in which data is physically stored in a table. Table data can be sorted in only way, therefore, there can be only one clustered index per table.

Please note: There is no such thing as create clustered index in Oracle. The Oracle database calls this concept index-organized tables (IOT), other databases use the term clustered index.

If a table requires several indexes, the decision about which one should be a clustered index is needed.

– Range queries benefit a great deal from clustering.
– If several attributes require range queries, relative benefits must be evaluated which attribute to cluster on.
– If a query is to be answered by doing an index search only (without retrieving data records), the corresponding index should not be clustered, since the main benefit of clustering is achieved when retrieving the records themselves.

  • Denormalization as a Design Decision for Speeding Up Queries

The normalization sometimes is sacrificed in favour of faster execution of frequently occurring queries and transactions.

This process of storing the logical database design (which may be in BCNF or 4NF) in a weaker normal form, say 2NF or 1NF, is called denormalization.

Typically, the designer adds attributes to a table that are needed for answering queries or producing reports so that a join with another table, which contains the newly added attribute, is avoided.

This reintroduces a partial functional dependency or a transitive dependency into the table, thereby creating the associated redundancy problems.

Other forms of denormalization consist of storing extra tables to maintain original functional dependencies that are lost during a BCNF decomposition.


DATABASE TUNING IN RELATIONAL SYSTEMS

  • Inputs to the tuning process include (internally collected in DBMS):
    • Sizes of individual tables.
    • Number of distinct values in a column.
    • The number of times a particular query or transaction is submitted/executed in an interval of time.
    • The times required for different phases of query and transaction processing (for a given set of queries or transactions).
    • Storage/tablespace statistics
    • I/O and device performance statistics
    • Query/transaction processing statistics
      • Execution times of queries and transactions, optimization times during query optimization.
    • Locking/redo logging related statistics
      • Rates of issuing different types of locks, transaction throughput rates, and redo log records activity.
    • Index statistics
      • Number of levels in an index, number of non-contiguous leaf pages, etc
  • Tuning Indexes

The initial choice of indexes may have to be revised for the following reasons:

– Certain queries may take too long to run for lack of an index.
– Certain indexes may not get utilized at all.
– Certain indexes may be causing excessive overhead because the index is on an attribute that undergoes frequent changes.

  • Tuning the Database Design

The database design has to be driven by the processing requirements as much as by data requirements. Dynamically changed processing requirements need to be addressed by making changes to the conceptual schema if necessary and to reflect those changes into the logical schema and physical design.

We may revert to the logical database design, make adjustments to the logical schema, and remap it to a new set of physical tables and indexes.

e.g. Existing tables may be joined (denormalized) because certain attributes from two or more tables are frequently needed together:
This reduces the normalization level from BCNF to 3NF, 2NF, or 1NF

  • Tuning Queries

Mainly two indications to suggest that query tuning may be needed:

1. A query issues too many disk accesses (for example, an exact match query scans an entire table).

2. The query plan shows that relevant indexes are not being used.

Many query optimizers do not use indexes in the presence of:
– arithmetic expressions (such as SALARY/365 > 10.50);
– numerical comparisons of attributes of different sizes and precision (such as AQTY = BQTY where AQTY is of type INTEGER and BQTY is of type SMALLINTEGER);
– NULL comparisons (such as BDATE IS NULL), and
– substring comparisons (such as LNAME LIKE “%MANN”).

Some DISTINCTS may be redundant and can be avoided without changing the result (the result is distinct). A DISTINCT often causes a sort operation and must be avoided as far as possible.

If multiple options for join condition are possible, choose one that uses a clustering index and avoid those that contain string comparisons.

One idiosyncrasy with query optimizers is that the order of tables in the FROM clause may affect the join processing. One may have to switch this order so that the smaller of the two relations is scanned and the larger relation is used with an appropriate index.

A query with multiple selection conditions that are connected via OR may not be prompting the query optimizer to use any index. It may be split up and expressed as a union of queries.

For example:
SELECT FNAME, LNAME, SALARY, AGE
FROM EMPLOYEE
WHERE AGE> 45 OR SALARY < 50000;

Splitting previous query up as:
SELECT FNAME, LNAME, SALARY, AGE
FROM EMPLOYEE
WHERE AGE> 45
UNION
SELECT FNAME, LNAME, SALARY, AGE
FROM EMPLOYEE
WHERE SALARY < 50000;

may utilize indexes on AGE as well as on SALARY.

WHERE conditions may be rewritten to utilize the indexes on multiple columns. For example,

SELECT REGION#, PROD_TYPE, MONTH, SALES
FROM SALES_STATISTICS
WHERE REGION# = 3 AND
((PRODUCT_TYPE BETWEEN 1 AND 3) OR
(PRODUCTTYPE BETWEEN 8 AND 10));

May use an index only on REGION# and search through all leaf pages of the index for a match on PRODUCT_TYPE:

SELECT REGION#, PROD_TYPE, MONTH, SALES
FROM SALES_STATISTICS
WHERE
(REGION# = 3 AND (PRODUCT_TYPE BETWEEN 1 AND 3)) OR
(REGION# = 3 AND (PRODUCT_TYPE BETWEEN 8 AND 10));

May use a composite index on (REGION#, PRODUCT_TYPE) and work much more efficiently.

First Steps When Performance Tuning

A well-planned methodology is the key to success in performance tuning. Without a proper diagnosis of the issue any troubleshooting step taken will further delay the resolution. Example, a DBA can not just blindly start tracing/tuning a SQL/Sessions when all symptoms on careful analysis points towards issues with Disk/Memory/Network etc.

Also remember that most common performance gains are attained through SQL/Code/application tuning and increasing hardware resources should always be your last resort. Developers play an important role in writing applications with effective SQL statements so it is important that the DBA and Developers should work together when facing a database/application issue.

When performance issues are reported to a DBA, the first step is to get as much information as possible that is related to the issue. You can ask below example preliminary questions to users/developer to collect the first level of information..

  1. What operations/program are executed?
  2. Is it Oracle seeded or custom program?
  3. How much time it used to take earlier?
  4. Is the run time increased over time or you are seeing sudden increase in run time?
  5. Was there any recent code change/migration?
  6. Is it always slow or for certain time of the day only?
  7. Is it slow for all parameters or for some specific parameters?
  8. How much data is getting processed?
  9. What is the frequency of job execution? Was there any change in frequency?
  10. Does the problem happens on both their test and production systems?

Asking above kind of questions will help you in deciding what part of system you should target.

  • Target the whole system
  • Target a single session
  • Target a single SQL statement

Your goal should be to answer below three questions:

Where is time spent?
You need to identify where the time is getting consumed. Is it the code or network latency or disk I/O bottlenecks or the CPU/Memory crunch

How is time spent?
You need to find out how the time is spent and how much time is spend in each layer – database, application, network, disk etc .

How to reduce the time spent?
Finally, based on above information see where the major chunk of time is spent and how to reduce it.

High level steps for System level performance issues:

Use vmstat top/prstat to identify system wide CPU/Memory consumption.
Use iostat to verify if disks are the bottleneck
Use netstat/tnsping etc to verify if network is issue.
Verify if any other resource intensive processes are running on server.
Verify filesystem space.
Check alert logs, application logs, traces etc.
Check database locks
Generate AWR reports to see what is eating up resources.
Check if increasing application/database memory/redo/undo/temp/SGA will help.

High level steps for Session level performance issues:

Find the Top SQLs executing under session.
Apply SQL optimization techniques on top SQLs.
Verify locking at session level
Generating AWR/ASH for that duration may help in providing useful information.

High level steps for SQL level performance issues:

Apply trace, Generate TKPROF file and analyze for top resource consuming SQL.
Avoid full table scans on large tables
Possible indexes on columns contained in the WHERE clause
Use AWR/ASH reports to get collective information
Also use SQLTRPT/SQLT
Verify is statistics are current
Verify if indexes are valid and are adequate.
Verify is parallelism/materialized views/Baselines/SQL Profiles will help
Monitor V$SESSION_LONGOPS to detect long running operations
Decide on using Hints if those are helping.
Table partitioning can be thought of as an option based on kind and size of tables.

Perspective on SQL Tuning

Fire up your favorite search engine, enter “SQL tuning” as the search term, and you are likely to detect a common theme in the results. They typically relate to modification of the database structure, such as adding or removing indexes; modification of the execution environment, such as gathering or modifying optimizer statistics; or more-substantial modifications such as rewriting the SQL or changing the physical design of the database. What is often missing in those same search engine results is a warning that any such modifications carry an inherent risk to the entire database environment in which the SQL statement is running. The larger the modification, the higher the risk.

As Oracle Database has evolved, the number of tuning tools has grown and the level of sophistication of each of these tools has also increased, giving developers and DBAs a plethora of options to explore. One consequence is that it is easy to dive straight into utilizing these tuning tools without stepping back and asking a few key questions about the SQL statement first:

  • What is the business functional requirement being served by the SQL?
  • Is the SQL correct?
  • Can this SQL ever run fast enough to meet the requirements?

Even without diving into low-level tools, it is easy to forget these questions. I’m frequently in meetings to assist with performance tuning issues, and upon my arrival, often the first thing presented to me is a single SQL statement, with no context or explanation surrounding it, and this plea: “This is the problem! Please solve it.” It may seem counterintuitive, but the first step of SQL tuning is to forget about the SQL.

The Business Requirement

No organization I’m aware of has ever had a business model of “Let’s make sure SQL runs fast,” unless that business was a SQL tuning consultancy! SQL statements, the applications that run them, and the IT departments that build and support those applications exist to meet a suite of business functional requirements. Those requirements may be created by the business as part of its desire to thrive commercially, or they may be imposed on the business by regulatory bodies. In either case, satisfying business requirements must be kept uppermost in a developer’s mindset when it comes to tuning SQL, because it drives the very decisions made by developers in creating the database design and the SQL that runs on it. Ignoring the underlying business requirements is a catalyst for poorly performing SQL.

SQL Correctness

Once the business functional requirement has been confirmed, you can turn your attention to the SQL statement. Just because a SQL statement returns without error, or even if the statement returns the correct results, that does not constitute a guarantee that the SQL statement is actually correct. Often the cause of poorly performing SQL is malformed construction, which can easily slip through testing cycles if the query results are still plausible (especially with small test datasets). Hence, when I’m asked to tune a SQL statement, I will spend a few minutes before tackling any performance-related avenues making sure that the SQL statement does not have any obvious syntax errors. Here are some of the common errors I see that typically cause SQL statements to be misdiagnosed as performance problems.

Summary

For IT professionals, there is always the temptation of “Let’s just jump into the code,” no matter what language the code is written in, and SQL code is no exception. And when it comes to the tuning of SQL, that temptation is even greater, considering that Oracle Database offers such a plethora of SQL tuning facilities. Throughout the various versions of Oracle Database, technologies such as SQL profiles, stored outlines, SQL plan baselines, tuning advisors, and optimizer hints all provide enormous opportunities for developers to improve SQL code performance.