ax·i·om  n. 
1. A self-evident or universally recognized truth. 
2. An established rule, principle, or law. 
3. Abbr. ax. A self-evident principle or one that is accepted as true without proof as the basis for argument; a postulate.
 
Etymology: from Greek axios, worthy

TM

 Login/Password  

Experience Expertise

SM
     

  ABOUT US

  IT SOLUTIONS

  METHODOLOGY

  DW GLOSSARY

  CUSTOMERS

  CAREERS

  CONTACT US
     
     
     

 

A Veteran-Owned Company

 

 

 

 

 

 

 

 

 

 

 

 

DATA WAREHOUSING (DW) GLOSSARY
 blank.gif (807 bytes)

Knowledge
is Power

 

blank.gif (807 bytes)


A

Ad-Hoc Query

An Ad-Hoc Query is a database query (or request for information) that is dynamically entered by a database user. Static queries that exist in production reports or provide data results for user-interface screens are not considered Ad-Hoc Queries.

Agent

An Agent is an application or automated routine that searches data sets for pre-defined thresholds or patterns. Upon encountering a targeted condition, the Agent performs a set action to report or record the event (also called an Alert). Examples of resulting actions include report generation, e-mail notification, and trigger execution.

Aggregate

An Aggregate is summary information stored in a Data Warehouse. Instead of repeatedly summarizing the same information to perform multiple analyses, information in a warehouse may be pre-summarized for performance. The two primary uses for Aggregates include:

  • Saving storage space - Data warehouses can get very large. The use of aggregates greatly reduces the space needed to store data. Historical information aggregation saves storage when details are not relevant or required.

  • Improving Performance – Less I/O means less processing time.

Warehouses must contain the lowest level of detail that is required to support specific analyses. Query Redirection, Materialized Views, or manually User Interface coding provides access to the level of detail requested by end-users. Having both the low-level and summarized information available provides the fastest query response time for the information There are five aggregate functions defined in standard SQL (Structured Query Language): SUM, COUNT, MIN (the lowest value), MAX (the highest value), and AVG (the average value).

Aggregate Navigation

Also Known As: Materialized View

Alert

A message sent automatically by a computer system when a certain situation occurs. One of the greatest benefits of data warehousing is the ability to set alerts. Practical applications of Alerts include:

  • A store manager can be automatically informed when a certain product's sales fall below or rise above a specified range.

  • A factory manager can be automatically informed when the failure rate of a product exceeds a specified level.

  • A sales manager can be automatically informed when a member of his staff achieves a personal high level of sales for a time period.

Alerts allow a company to receive critical business information in the quickest possible time.

Atomic Data

Atomic Data are the data elements that represent the lowest level of detail. For example, in a daily sales report, the individual items sold would be atomic data, while rollups such as invoice and summary totals from invoices are aggregate data.

Attribute

See Data Model

 

 

B

Business Intelligence (BI) Tools

Business Intelligence Tools are software programs enabling business users to see and use large amounts of complex information. Classes of BI tools include:

  • Multi-Dimensional Analysis Software - Also Known As: OLAP (On-Line Analytical Processing) - Software that gives the user the opportunity to look at the data from a variety of different dimensions.

  • Query Tools- Software that allows the user to ask questions about patterns or details in the data.

  • Data Mining Tools - Software that automatically searches for significant patterns or correlation in the data

Business Rule

A Business Rule is a calculation or documented procedure applied to business attributes. Defining of Business Rules occurs during Data Warehouse design. Procedures or Functions within the ETL process enforce these business rules.

 

back to top


 

C

Calculated Measure

Also Known As: Derived Data

Cardinality

Cardinality is the degree of even distribution of values across a specific table column. Low cardinality distributions contain few unique values over a set or subset of rows. High cardinality distributions contain many unique values. A unique primary key represents the highest level of cardinality.

Cell

A Cell is a single data value stored at the intersection of a set of dimensional values. An example of a cell would be the value for a measure Revenue in a Cube for a specific Product, Time, and Customer intersection. Cells may be blank if there is no measure for the intersection.

Client/Server

Client/Server is a distributed technology architecture where process load is distributed between two sides of a transaction. Software on a client (desktop PC) is responsible for much of the user-interface rendering activities. The server component is only responsible for providing the data sets needed by the client component.

Column

A Column is a single attribute of a table, view, or query.

Cube

Also Known As: HyperCube or Multi-Dimensional Cube. A Cube is a data structure and access method for multi-dimensional information. Cubes represent an efficient storage algorithm for routinely aggregated or drilled into analytical data. Desktop OLAP tools package multi-dimensional information into Cubes distributable across the network or to individual computers.

 

back to top


 

D

Dashboard

Also Known As: Digital Dashboard or EIS

A Dashboard provides an executive view of specific Key Performance Indicators within an organization. Dashboards typically display high-level measurements and do not provide drill-down capability.

Data-Based Knowledge

Knowledge derived from data using Business Intelligence Tools and the process of Data Warehousing. In the past, business users based decisions on a combination of experience, perception, and intuition. Decision Support Systems consolidated business information into a single system that mirrors the decision maker’s thought processes. Data based knowledge has the following advantages over experience/intuition based knowledge:

  • Accuracy because it is a consolidation from several sources,

  • Currency because the data gathering and publishing process is automatic,

  • Flexibility because different perspectives of the same information are available depending on the business view needed,

  • Insightful because complex data patterns emerge from data mining that otherwise go unnoticed,

  • Less subjective because conclusions are tied directly to the physical data.

Data Cleansing

Also Known As: Data Quality Assurance

Data Dictionary

The Data Dictionary is the catalog of data elements within a database system. Data Dictionaries maintain the metadata needed to manage and define data elements.

Data Extraction

Data Extraction is the process of extracting specific source data to satisfy a data loading or reporting needs. Typically, Data Warehouse data extraction occurs on one or more source OLTP systems. Business rules and filtering applied during the data extraction process minimize the volume of data transferred to the data warehouse or data warehouse stage area.

Data, Information, and Knowledge

Data is the reality that a computer records, stores, and processes. The use of computers to manipulate, report, and analyze this reality is referred to as data processing. Data only has meaning when combined with information that describes the data’s relevance or application (also known as: metadata). For example, the raw numbers within a spreadsheet have no meaning without the column and row headers defining a relationship.

Information is what a person is able to understand or deduce about data based on the available metadata.

Knowledge is the result of combining information with experience and is what a business uses to make decisions. The process of organizing information in such a way as to create data-based knowledge is called Data Warehousing. Business Intelligence Tools are software products that present knowledge to business users.

Every year the amount of raw data in the world is approximately doubling. The goal of business intelligence and data warehousing is changing data into knowledge and incorporating the knowledge back into a business decision making process.

Data Mart

A Data Mart is a database having the same characteristics as a data warehouse, but focused on the data for one division or one business unit within an enterprise. The Data Mart may precede or succeed the Data Warehouse depending on the architecture and implementation plan.

Typically, the data mart is the prototype or the first step of a data warehousing process. An enterprise picks the division or group that would most benefit from data-based knowledge and constructs a data mart to satisfy specific business needs. Proper planning performed during the design incorporates attributes needed by other business requirements, thus facilitating future integration. As business needs expand, additional subject areas are added to the design and eventually the data mart becomes a data warehouse.

Data Marts may also be created after a Data Warehouse for a number of accessibility and performance reasons.

Data Migration

Data Migration is the act of moving data from one environment to another. In Data Warehousing, migration occurs when information transfers from a legacy system into a data warehouse.

Data Mining

Data Mining is the process of finding hidden patterns and relationships in the data. Analyzing data involves the recognition of significant patterns and relationships. At a simple level, Business Analysts recognize relationships between common business objects such as: customer, product, and time relating to a specific sales transaction. Business Analysts can effectively see patterns in small data sets where viewing all factors simultaneously is possible. As information volumes grow, however, there is a need for complex analyses to identify patterns and relationships that are not apparent. Data Mining tools interrogate information and identify relationships that may become very strategic when planning campaigns or profiling behavior.

One of the primary advantages of Data Mining for Data Warehousing is that users do not need to have a preexisting hypothesis about the information to analyze it. Without Data Mining, Analysts must “look” for trends in information. With Data Mining, the system “finds” patterns using techniques like “classification”, “association”, or “clustering”.

Data Model

A Data Model is a logical representation of data element (Also Known As: Attribute) groupings, relationships, and meta data independent of software or hardware. Data Models may exist in a Logical or Physical form. A Logical Data Model defines a design in terms of business processes and business relationships. A Physical Data Model represents the physical implementation of the Logical design in the host RDBMS system.

Modeling tools such as ER Win or System Architect allow designers to develop a data model that is compatible with multiple RDBMSs.

An Entity within a data model consists of logical groupings of attributes. Entities of a Physical Data Model directly corresponds to a table or view within a database system. Meta data for each attribute and entity for a data warehouse project should be contained within the data model.

Data Quality Assurance

Also Known As: Data Cleansing or Data Scrubbing

The process of checking the quality of the data being imported into the data warehouse.

Data quality assurance is one of the greatest challenges in the process of data warehousing. If the data-based knowledge generated by the data warehouse is to be trusted, the data entered into the warehouse must be complete and accurate - "garbage in, garbage out".

Data quality can be a challenge for several reasons:

  • The data is being consolidated from a variety of legacy sources that may have differing definitions of key concepts such as "customer" or "profit".

  • The legacy data was not originally collected for the purpose of decision support so some of the key data might be missing, incomplete, or not as accurate as desired.

  • There might be times when all the data is not received from one of the legacy systems. This could make comparisons between time periods invalid.

A significant portion of time in the development process should be set aside for setting up the data quality assurance process and implementing whatever data cleansing is needed.

In a production environment, there should be a data quality report generated after each data warehouse import. There should be provision for rolling back an import if data quality testing indicates that the data is unacceptable.

Data Scrubbing

Also Known As: Data Quality Assurance

Data Transformation

Transformation is the modification of data during the data extraction and loading of a data warehouse. This modification can include:

  • Validation performed during Data Quality Assurance

  • Processing Calculations

  • Changing Data Types

  • Format Changes

  • Replacing Codes with Actual Values

  • Aggregating the Data

Data Warehouse

Also Known As: Information Warehouse or Warehouse

A Data Warehouse is a physical staging area for decision-support information. It collects data from various applications in an organization's operational systems, integrates the data into a logical model of business subject areas, stores the information in a manner that is accessible and understandable to non-technical decision makers, and delivers information to decision makers across the organization through various report writing and query tools.

The term “Data Warehouse” was coined by Bill Inmon in 1990 as a database that is a subject-oriented, integrated, time-variant and non-volatile collection of data to support a decision making process. Specific definition of these terms is:

  • Subject-oriented - Data that gives information about a particular subject instead of about a company's on-going operations,

  • Integrated - Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole,

  • Time-variant - All data in the data warehouse is identified with a particular time period,

  • Non-volatile - Data is stable in a data warehouse. Information is added, but data is never removed.

This enables management to gain a consistent picture of the business.

Data Warehousing

Data Warehousing is the process of visioning, planning, building, using, managing, maintaining, and enhancing data warehouses and/or data marts. There are many steps in the data warehousing process:

  • Visioning - Having an idea about what could be accomplished.

  • Learning - Studying the potential of data warehousing.

  • Justifying - Developing a business purpose for the process.

  • Budgeting - Counting the cost.

  • Deciding - Making a commitment to develop and use data-based knowledge.

  • Gathering Information - Examining legacy systems.

  • Interviewing Users - Finding what information is needed.

  • Choosing Tools - Choosing the hardware, the database management system, the data extraction tools, and the Business Intelligence tools..

  • Building, Using, Testing, and Evaluating the Prototype - Repeat this step and the above steps as necessary.

  • Deploying - Putting the system into operation.

  • Training - Helping users make full use of the Business Intelligence tools.

  • Managing - Keeping track of scheduled data replication, system usage, and query performance.

  • Adding, Modifying, On-Going Development - As the system is used, new possibilities will be discovered. Consider also all the actions that take place as a part of the data warehousing process.

  • Data Replication - Periodic copying of legacy data.

  • Data Transformation - Transforming the legacy data into the form in which it will be stored in the data warehouse.

  • Data Quality Assurance - Testing the data for inconsistencies and errors.

  • Data Storage - Storing the data in a DBMS (Database Management System).

  • Metadata Storage - Storing the description of the data - the data about the data.

  • Data Mart Population - Populating all the data marts that receive their data from the warehouse.

  • Setting Up Business Intelligence Tools - Giving users access to the data through multi-dimensional analysis, querying, and data mining.

  • Setting Alerts - Establishing conditions that result in an automatic message being sent.

  • Data Warehousing Management - Keeping track of how well all the other actions are being carried out.

Database Management System (DBMS)

The DBMS is the software used to store, access, and manage data. Examples of these software include Oracle, SQL Server, and Sybase. There are two main types of Database Management Systems used for business intelligence and data warehousing - specialized Multi-Dimensional Database Management Systems (MDBMS) and the more widely used general purpose Relational Database Management Systems (RDBMS).

Data Webhouse

A Data Webhouse is a data warehouse built specifically for collecting and analyzing web page user information. The primary source of information in a Webhouse is the web server logs. Within these logs is a plethora of information that defines every action performed during a user session. When integrated with other subject areas of traditional data warehouses, Webhouses relate customer information or history to web site behavior.

DBMS

Also Known As: Database Management System

Decision Support System (DSS)

A DSS is a computer system designed to assist an organization in making decisions. The data store behind a DSS is typically a Data Warehouse or Data Mart, however, some DSS do not require the capabilities found in a warehouse design.

De-Normalization

De-Normalization is reversing the application of Normalization techniques to information. Normalization enforces data integrity and minimizes storage space usage. The downside to a normalized design is additional processing overhead incurred during query processing to re-assemble the information before display or analysis. De-Normalizing a design eliminates this overhead by joining frequently accessed information into the same table. In a Data Warehouse, dimensions are very often De-Normalized for query performance improvements.

Density

Opposite of: Sparsity

Derived Data

Also Known As: Calculated Measure

Data not directly supplied by source systems, but calculated according to business rules. Derived data includes aggregated data and other data transformed from its original state.

Desktop OLAP (DOLAP)

Desktop OLAP is a form of OLAP tool where a Cube containing a summary of multi-dimensional data exists on an individual user's computer.

Dimension

A Dimension is a component of a star schema containing the criteria by which analyses are constrained. During data warehouse design, we group attributes according to relationship and level of detail. These groupings form hierarchies and classifications by which business measure are compared and summarized. Each physical grouping collectively is a dimension.

Examples of a dimension found in most all data warehouses is Calendar. A Calendar dimension contains a single record for every day within the domain of a data warehouse. Each day has hierarchies of month, year, quarter, season, day of week, fiscal period, and day of month. Business measure, or fact, occurring on a single day group or summarize to any level hierarchy found in the Calendar Dimension.

Most dimension tables in a star schema are intentionally de-normalized. This typically does not add a significant amount of storage space to the database because the overall size of the dimension tables is very small when compared to the size of the fact table.

Dimensional Modeling

Dimensional Modeling is the process of taking business requirements and transforming them into a data model specifically designed for analytical reporting. The term Dimensional Modeling comes from one of the two primary components of a dimensional model, data warehouse dimensions. The other primary component are facts.

Drill Down, Drill Up, and Drill Across

Drilling though data is the ability to move between levels of data hierarchies when viewing data within a data warehouse.

  • Drill Down - Changing the view of the data to a greater level of detail.

  • Drill Up - Changing the view of the data to a higher level of aggregation.

  • Drill Across – Changing the view to another dimensional value at the same level as the current view.

Drilling down and drilling up allow an analyst to move down and up levels of summary to see how the information at the various levels is related. A powerful use of this capability is to view information at its highest level of summarization to identify obvious trends in the information. To further investigate the cause of a trend a user simply drills down one or more levels to uncover the detail behind the summarized information

Dynamic Queries

Dynamic Queries are dynamically constructed SQL statements (Also Known As: Queries) based on conditions derived from results of other queries. Desktop query tools often use this approach to simplify the interface between the tool and the database. Dynamic Queries parse at run time.

 

back to top


 

E

End Users

Also Known As: Business Users

End users are the business users that interface to information through a front-end program or set of reports.

Enterprise Data Warehouse

Enterprise Data Warehouse is a term used to describe a relational data warehouse integrating strategic knowledge from several departments within an enterprise. Often, Enterprise-level Data Warehouse planning identifies subject areas and clearly defines monetary priority for implementation. The Enterprise Data Warehouse implementation occurs as development of integrated Data Mart components.

Enterprise Information System / Executive Information System (EIS)

Also Known As: Digital Dashboard

An EIS is a computer system or user-interface that presents a summary of a company's important data on a routine basis.

Entity or Entity Relationship Diagram (ERD)

See Data Model

Extract Transform and Load (ETL)

ETL is the complete data movement process from source systems to the production data warehouse. Components of ETL are Data Extraction, information consolidation or cleansing, and loading.

Extract Specification

The Extract Specification is the standard defined interface of information between source systems and the data warehouse. This specification lists the individual attributes and sources of data elements needed to maintain production loads within the data warehouse system. Attributes are grouped by their business characteristics and required data loading strategy.

 

back to top


 

F

Fact

Facts are data elements associated with business activity or business performance measurements (Also Known As: Key Performance Indicators or KPI). Typically, facts are quantitative values such as gross sales dollars or costs dollars.  Cube cells contain fact values.

Fact Group

A fact group is a group of fact tables that collectively make up a subject area.

Fact Table

In a star schema, the central table containing Facts has foreign-key relationships to all dimension that are relevant to the fact measurement. There are two types of fields in a fact table:

  • Fields storing the foreign keys that connect each particular fact to the appropriate dimension value,

  • Fields storing the individual facts.

 

back to top


 

G

Granularity

Granularity is the level of detail of facts stored in a data warehouse. The granularity of the fact table is one of the most significant design decisions in creating a data warehouse. Facts should be at an aggregation level low enough to support the lowest level of analysis. For example, if data warehouse requirements state it is necessary to summarize product SKU sales, the fact table granularity must be at an invoice line-item level, not the invoice ticket level.

 

back to top


 

H

Hash

Hashing is a mathematical method of random distribution. In databases, hashing mathematically assigns one of a fixed number of codes to an attribute value in an attempt to evenly distribute values across multiple data partitions. Hashing is not effective when a data set has a large volume with low Cardinality.

Hierarchy

A hierarchy is a relationship between attributes in a table consisting of sets and sub-sets. Hierarchies allow traversing of aggregation levels within data by querying with SQL using a GROUP BY clause. Moving between the levels of a hierarchy is called drilling up and drilling down.

Hybrid OLAP (HOLAP)

HOLAP is a hybrid architecture that combines the advantages of Relational OLAP (ROLAP) and Multi-Dimensional OLAP (MOLAP) architectures. In HOLAP architecture, aggregated levels of hierarchies are stored in MOLAP cubes for query efficiency. Detail information (data at the lowest level of aggregation) that would be too cumbersome to store in cubes resides in ROLAP architecture in the production data warehouse. HOLAP provides connectivity to the large data sets in the relational database while taking advantage of the faster performance of the multi-dimensional aggregation storage.

HyperCube

Also Know As: Cube

 

back to top


 

I

Index

An Index is a database structure enabling rapid table searching by one or more column criteria. Indexing subsets of columns in a table enabling queries to access only the necessary database blocks, thus reducing disk access (and time). A query processor utilizes an index only if doing so reduces the number of disk reads needed to return the query results.

Two common types of indexing strategies used by RDBMSs are Btree and Bitmap. Btree indexes are tree-like data structures sorted by the index key field(s). Theoretically, the tree structure has many branches to minimize the number of comparisons necessary to reach a leaf node. Bitmap indexes are effective for columns with low cardinality distribution of values. A bitmapped index contains a record for each record in the table. The bit count in a bitmap index record matches the number of distinct values in the column(s) indexed (plus overhead). Binary comparisons of bitmaps make searches extremely fast.

 

back to top


 

L

Legacy System

See Also: Source Database

A Legacy System is an established computer system that is an integral part of a company’s daily business operations. One of the biggest challenges of the data warehousing process is to extract and consolidate data from a variety of legacy systems through out an enterprise.

 

back to top


 

M

 

Materialized Views

Also Known As: Aggregate Navigation

A Materialized view is an Oracle RDBMS feature that provides summary-level information using a database snapshot. Once created, Materialized Views use Query Redirection to transparently process a query request in the most efficient manner.

Measure

Also Known As: Fact

Meta Data

Meta Data is information describing the application, constraints, source, transformation, meaning, units, relationships, and relevance of data in the warehouse. A common definition of Meta Data is that it is data about data. In Data Warehousing, Meta Data is any information about a database attribute needed to support and manage the operation of the data warehouse.

Middleware

Middleware is a communications layer allowing applications to interact across hardware and network environments.

Multi-Dimensional Analysis

Also Known As: OLAP (On-Line Analytical Processing)

Multi-Dimensional Cube

Also Known As: Cube

Multi-Dimensional Database Management System (MDBMS)

An MDBMS is a database management system designed specifically to support efficient storage and retrieval of OLAP data. The MDBMS interfaces with or contains a user-interface that supports drilling through hierarchies in the data. The structure where an MDBMS stores information is a cube.

Multi-Dimensional On-Line Analytical Programming (MOLAP)

Also Known As: OLAP

 

back to top


 

N

Natural Key

A Natural Key is a primary key within a source system. Natural keys cannot always be primary keys in a data warehouse dimension because multiple versions of a record may need to be stored.

Non-Volatile

Non-Volatile data is data that does not change over time. Most all data in a data warehouse is non-volatile. Since data warehouses record and reconstruct history, changes to existing data are rare. Non-volatility is one of the original defining characteristics of a data warehouse.

Normalization

Normalization is the process of organizing data to eliminate redundancy. In OLTP systems, normalization enforces referential integrity, minimizes storage requirements, and optimizes the data insertion performance.

The downside of normalization is the increased overhead required to assemble data sets at query time. An example of a normalized design is a contact database where addresses are stored in a separate table from contact name. Two contacts with the same address would join to a single address record rather than the address information duplicated in the same table as the contact name.

Normalization exists in Data Warehouse Fact Tables. Dimension Tables in a data warehouse de-normalized information to increase query performance.

 

back to top


 

O

OLAP (On-Line Analytical Processing)

The term OLAP distinguishes data warehousing activities from On-Line Transaction Processing (OLTP) business activities. In its broadest usage, OLAP is a synonymous to data warehousing. In a narrow usage, OLAP refers to the tools used for Multi-Dimensional Analysis.

The OLAP process organizes and aggregates data into levels of business information (facts) and reporting these levels by business classifications (dimensions). A spreadsheet is a two-dimensional analysis tool. OLAP provides spreadsheet-like analysis simultaneously spanning as many dimensions as the business needs to support. Several software packages facilitate manipulating and displaying this information that is too complex to represent in two-dimensions.

OLTP (On-Line Transaction Processing)

OLTP is a collective term encompassing the operations and systems that process transactions as part of a company’s daily business activities. OLTP systems (Also Known As: Legacy) are the information sources for data warehouses.

Operational Data Store (ODS)

An ODS is an integrated database of operational data. ODS sources may include OLTP systems or any system containing current or near-term data. One qualifier of an ODS is data volatility (frequent updates) and the absence of historical information.

An ODS is a resource for operational reporting. Often, an ODS is created to consolidate information from multiple sources into a repository that is used only for short-term reporting. Once consolidated, the information in an ODS can be used as a source for a data warehouse system.

Operational System

Also Known As: OLTP system.

 

back to top


 

P

Parse

Parsing is the act of translating commands into a language native to an interfacing system. In SQL, English commands are parsed into the host RDBMS by an interpreter. The query is translated, appropriate information retrieved, and results are returned to the user.

Partition

A partition is a physical segmentation of information. Partitions organize information into smaller, manageable segments primarily for performance reasons.

Pivot Table

A table used to summarize data and rotate rows and column titles to obtain different views of the data. Rows can be “layered” to reflect the different levels of data that can be viewed.

Primary Key

A primary key is a column or columns of a data set containing as many distinct values as there are rows. Most tables in a relational database contain a single primary key.

 

back to top


 

Q

Query

A Query is a request for information asked of a RDBMS. The result of a query is typically a single row or set of rows from a table.

Query Redirection

Query Redirection is an Oracle feature that automatically changes a user’s SQL statement to increase efficiency. This operation is done when the query optimizer determines that a materialized view or other table can process a query request with less overhead than the requested table. A practical example of query redirection is a user query that aggregates a base table and groups by a particular column. If a materialized view exists for the same table that has the same query summarized already, the query optimizer accesses the materialized view instead of the base table.

 

back to top


 

R

RAID Disk Management

Redundant Array of Inexpensive Disks (RAID) is a storage technology that allows virtual disk management of commodity disk drives. RAID technology provides performance and/or reliability with the flexibility to utilize disks in a manner that supports business and computational needs. The various configurations of RAID each have specific benefits and liabilities.

  • RAID 0 – Provides non-redundant disk striping. Each disk is divided into smaller units (stripes) and stripes from multiple disks are concatenated into logical volumes. The benefit of RAID 0 is Input/Output is dispersed across multiple disks, thus removing the chance of performance problems due to disk saturation. Sequential read and write operations are performed well, but random read operations may be slower. The liability of RAID 0 is no fault tolerance.

  • RAID 1 – Provides exact mirroring of one disk to another for fault tolerance. This approach allows one disk to fail while the data integrity remains intact. The benefits of RAID 1 are read performance (since smart controllers can retrieve the data sectors on the disk closest to the present head location) and complete redundancy. The liability of RAID 1 is the cost to implement because twice the physical disks are needed.

  • RAID 01 or 10 – RAID 10 is a combination of RAID 0 and RAID 1 (striping with mirroring). This approach does not use any more disks than RAID 1, but provides performance gains with fault tolerance.

  • RAID 2 – No longer used, technology was improved and replaced by RAID 3.

  • RAID 3 – Provides RAID 0 striping with redundancy via parity checking instead of mirroring. The striping is performed at the byte-level. In RAID 3, a single disk is dedicated to storing parity information. If any single disk fails, the data can be reconstructed from the remaining disk using an XOR (exclusive OR) operation. Besides the redundancy, benefits of RAID 3 include superior performance doing random read operations. The liability of this approach is the additional write overhead to maintain the parity disk.

  • RAID 4 – Similar to RAID 3 except that the striping is performed at the block-level.

  • RAID 5 – Combines RAID 0 and RAID 4 techniques to provide block striping with redundancy via parity checking but instead of having a dedicated parity disk, the parity may exist on any of the disks in the stripe. The benefit of this approach is disk economy (as opposed to full mirroring) with security. The liability of this approach is the additional write overhead (especially for sequential write operations) to maintain the parity, but this liability is spread across all the disks in the stripe.

  • RAID 6 – Extends RAID 5 to contain two distributed parity disks. Having two parity disks allows for dual disk failure with no loss of data integrity. The liability of this approach is that it takes three writes to save one block of information.

  • RAID 7 or RAID S – Hardware implementations of RAID 5 with improved performance.

Referential Integrity (RI)

Referential Integrity is the process of ensuring that relationships between database objects are valid and remain valid. Enforcement of RI rules occurs automatically by the RDBMS during key field insertion, update, or delete.

Relational Database Management System (RDBMS)