Database Design and Tuning

PHYSICAL DATABASE DESIGN IN RELATIONAL DATABASES
Physical design: is an activity where the goal is to design the appropriate structuring of data in storage.

It is not possible to make meaningful physical design decisions and performance analyses until we know the queries, transactions, and applications that are expected to run on the database in a high-level form.

Factors That Influence Physical Database Design

  • Analyzing the Database Queries and Transactions
    For each query, we should specify the following:
    • The tables that will be accessed by the query.
    • The attributes on which any selection conditions for the query are specified.
    • The attributes on which any join conditions or conditions to link multiple tables or objects for the query are specified.
    • The attributes whose values will be retrieved by the query.
  • Analyzing the Expected Frequency of Invocation of Queries and Transactions
    • Expected rates of invocation with attributes in each file as a selection attribute or a join attribute, over all the queries and transactions.
    • Generally “80-20 rule” applies. It is sufficient to determine the 20 percent or so most important ones that may use 80% of processing.
  • Analyzing the Time Constraints of Queries and Transactions
    • May have stringent performance constraints.
    • The selection attributes used by queries and transactions with time constraints –> higher-priority candidates for primary access structures.
  • Analyzing the Expected Frequencies of Update Operations
    • A minimum number of access paths (e.g. index) should be specified for a file that is updated frequently, because updating the access paths themselves slows down the update operations.
  • Analyzing the Uniqueness Constraints on Attributes
    • Access paths should be specified on all candidate key attributes-or sets of attributes-that are either the primary key or constrained to be unique.

Physical Database Design Decisions

  • Whether to index an attribute:
    The attribute must be a key, or there must be some query that uses that attribute either in a selection condition (equality or range of values) or in a join.

    One factor in favour of setting up many indexes is that some queries can be processed by just scanning the indexes without retrieving any data.
  • What attribute(s) to index on:
    An index can be constructed on one or multiple attributes.
    For example, (garment_style_#, color) in a garment inventory database, a multiattribute index is warranted.

    The ordering of attributes within a multiattribute index must correspond to the queries. For example, the above index assumes that queries would be based on an ordering of colors within a garment_style_# rather than vice versa.
  • Whether to set up a clustered index:

A clustered index defines the order in which data is physically stored in a table. Table data can be sorted in only way, therefore, there can be only one clustered index per table.

Please note: There is no such thing as create clustered index in Oracle. The Oracle database calls this concept index-organized tables (IOT), other databases use the term clustered index.

If a table requires several indexes, the decision about which one should be a clustered index is needed.

– Range queries benefit a great deal from clustering.
– If several attributes require range queries, relative benefits must be evaluated which attribute to cluster on.
– If a query is to be answered by doing an index search only (without retrieving data records), the corresponding index should not be clustered, since the main benefit of clustering is achieved when retrieving the records themselves.

  • Denormalization as a Design Decision for Speeding Up Queries

The normalization sometimes is sacrificed in favour of faster execution of frequently occurring queries and transactions.

This process of storing the logical database design (which may be in BCNF or 4NF) in a weaker normal form, say 2NF or 1NF, is called denormalization.

Typically, the designer adds attributes to a table that are needed for answering queries or producing reports so that a join with another table, which contains the newly added attribute, is avoided.

This reintroduces a partial functional dependency or a transitive dependency into the table, thereby creating the associated redundancy problems.

Other forms of denormalization consist of storing extra tables to maintain original functional dependencies that are lost during a BCNF decomposition.


DATABASE TUNING IN RELATIONAL SYSTEMS

  • Inputs to the tuning process include (internally collected in DBMS):
    • Sizes of individual tables.
    • Number of distinct values in a column.
    • The number of times a particular query or transaction is submitted/executed in an interval of time.
    • The times required for different phases of query and transaction processing (for a given set of queries or transactions).
    • Storage/tablespace statistics
    • I/O and device performance statistics
    • Query/transaction processing statistics
      • Execution times of queries and transactions, optimization times during query optimization.
    • Locking/redo logging related statistics
      • Rates of issuing different types of locks, transaction throughput rates, and redo log records activity.
    • Index statistics
      • Number of levels in an index, number of non-contiguous leaf pages, etc
  • Tuning Indexes

The initial choice of indexes may have to be revised for the following reasons:

– Certain queries may take too long to run for lack of an index.
– Certain indexes may not get utilized at all.
– Certain indexes may be causing excessive overhead because the index is on an attribute that undergoes frequent changes.

  • Tuning the Database Design

The database design has to be driven by the processing requirements as much as by data requirements. Dynamically changed processing requirements need to be addressed by making changes to the conceptual schema if necessary and to reflect those changes into the logical schema and physical design.

We may revert to the logical database design, make adjustments to the logical schema, and remap it to a new set of physical tables and indexes.

e.g. Existing tables may be joined (denormalized) because certain attributes from two or more tables are frequently needed together:
This reduces the normalization level from BCNF to 3NF, 2NF, or 1NF

  • Tuning Queries

Mainly two indications to suggest that query tuning may be needed:

1. A query issues too many disk accesses (for example, an exact match query scans an entire table).

2. The query plan shows that relevant indexes are not being used.

Many query optimizers do not use indexes in the presence of:
– arithmetic expressions (such as SALARY/365 > 10.50);
– numerical comparisons of attributes of different sizes and precision (such as AQTY = BQTY where AQTY is of type INTEGER and BQTY is of type SMALLINTEGER);
– NULL comparisons (such as BDATE IS NULL), and
– substring comparisons (such as LNAME LIKE “%MANN”).

Some DISTINCTS may be redundant and can be avoided without changing the result (the result is distinct). A DISTINCT often causes a sort operation and must be avoided as far as possible.

If multiple options for join condition are possible, choose one that uses a clustering index and avoid those that contain string comparisons.

One idiosyncrasy with query optimizers is that the order of tables in the FROM clause may affect the join processing. One may have to switch this order so that the smaller of the two relations is scanned and the larger relation is used with an appropriate index.

A query with multiple selection conditions that are connected via OR may not be prompting the query optimizer to use any index. It may be split up and expressed as a union of queries.

For example:
SELECT FNAME, LNAME, SALARY, AGE
FROM EMPLOYEE
WHERE AGE> 45 OR SALARY < 50000;

Splitting previous query up as:
SELECT FNAME, LNAME, SALARY, AGE
FROM EMPLOYEE
WHERE AGE> 45
UNION
SELECT FNAME, LNAME, SALARY, AGE
FROM EMPLOYEE
WHERE SALARY < 50000;

may utilize indexes on AGE as well as on SALARY.

WHERE conditions may be rewritten to utilize the indexes on multiple columns. For example,

SELECT REGION#, PROD_TYPE, MONTH, SALES
FROM SALES_STATISTICS
WHERE REGION# = 3 AND
((PRODUCT_TYPE BETWEEN 1 AND 3) OR
(PRODUCTTYPE BETWEEN 8 AND 10));

May use an index only on REGION# and search through all leaf pages of the index for a match on PRODUCT_TYPE:

SELECT REGION#, PROD_TYPE, MONTH, SALES
FROM SALES_STATISTICS
WHERE
(REGION# = 3 AND (PRODUCT_TYPE BETWEEN 1 AND 3)) OR
(REGION# = 3 AND (PRODUCT_TYPE BETWEEN 8 AND 10));

May use a composite index on (REGION#, PRODUCT_TYPE) and work much more efficiently.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s