|
A
Ad-Hoc Query
An Ad-Hoc Query is a database
query
(or request for information) that
is dynamically entered by a database user. Static queries that
exist in production reports or provide data results for
user-interface screens are not considered Ad-Hoc Queries.
Agent
An Agent is an application or automated routine
that searches data sets for pre-defined thresholds or patterns. Upon encountering a targeted
condition, the Agent performs a set action to report or record
the event (also called an
Alert).
Examples of resulting actions include report generation, e-mail
notification, and trigger execution.
Aggregate
An Aggregate is summary information stored in a
Data Warehouse. Instead of
repeatedly summarizing the same information to perform multiple analyses,
information in a warehouse may be pre-summarized for performance. The two primary uses for Aggregates
include:
Improving Performance – Less I/O means less processing time.
Warehouses must contain the lowest level of
detail that is required to support specific analyses.
Query Redirection,
Materialized Views, or
manually User Interface coding provides access to the level of
detail requested by
end-users.
Having both the low-level and summarized information available
provides the fastest query response time for the information There are five aggregate functions defined in
standard
SQL (Structured Query
Language): SUM, COUNT, MIN (the lowest value), MAX (the highest value),
and AVG (the average value).
Aggregate Navigation
Also Known As :
Materialized View
Alert
A message sent automatically by a computer
system when a certain situation occurs. One of the greatest benefits of data warehousing is the ability to
set alerts. Practical applications of Alerts include:
-
A store
manager can be automatically informed when a certain product's
sales fall below or rise above a specified range.
-
A
factory manager can be automatically informed when the failure
rate of a product exceeds a specified level.
-
A sales
manager can be automatically informed when a member of his staff
achieves a personal high level of sales for a time period.
Alerts allow a company to receive critical
business information in the quickest possible time.
Atomic Data
Atomic Data are the data elements that represent
the lowest level of detail. For example, in a daily sales report, the individual items sold would be
atomic data, while rollups such as invoice and summary totals
from invoices are aggregate data.
Attribute
See Data Model
B
Business Intelligence (BI) Tools
Business Intelligence Tools are software
programs enabling business users to see and use large amounts of complex information. Classes of BI tools
include:
-
Multi-Dimensional Analysis Software -
Also Known As:
OLAP (On-Line Analytical
Processing) - Software that gives the user the opportunity to
look at the data from a variety of different dimensions.
-
Query
Tools- Software that allows the user to ask questions about
patterns or details in the data.
-
Data
Mining Tools - Software that automatically searches for
significant patterns or correlation in the data
Business Rule
A Business Rule is a calculation or documented
procedure applied to business
attributes.
Defining of Business Rules occurs during Data Warehouse
design. Procedures or Functions within the ETL process enforce these business rules.
back to top
C
Calculated Measure
Also Known As:
Derived Data
Cardinality
Cardinality is the degree of even distribution
of values across a specific table column. Low cardinality distributions contain few unique values over a
set or subset of rows. High cardinality distributions contain many unique values. A unique primary key
represents the highest level of cardinality.
Cell
A Cell is a single data value stored at the
intersection of a set of dimensional values. An example of a
cell would be the value for a
measure
Revenue in a
Cube
for a specific Product, Time, and
Customer intersection. Cells may be blank if there is no measure for
the intersection.
Client/Server
Client/Server is a distributed technology
architecture where process load is distributed between two sides
of a transaction. Software on a client (desktop PC)
is responsible for much of the user-interface rendering activities. The server component is only
responsible for providing the data sets needed by the client component.
Column
A Column is a single
attribute
of a
table,
view, or query.
Cube
Also Known As:
HyperCube or Multi-Dimensional
Cube. A Cube is a data structure and access method for
multi-dimensional information. Cubes represent an efficient storage algorithm for routinely
aggregated
or
drilled
into analytical data. Desktop
OLAP
tools package multi-dimensional information into Cubes
distributable across the network or to individual computers.
back to top
D
Dashboard
Also Known As:
Digital Dashboard or
EIS
A Dashboard provides an executive view of
specific Key Performance Indicators within an organization. Dashboards typically display high-level
measurements and do not provide drill-down capability.
Data-Based Knowledge
Knowledge derived from data using
Business
Intelligence Tools and the
process of
Data Warehousing. In the past, business users based decisions on a
combination of experience, perception, and intuition. Decision Support Systems consolidated business
information into a single system that mirrors the decision maker’s thought processes. Data based knowledge
has the following advantages over experience/intuition based knowledge:
-
Accuracy because it is a consolidation from several sources,
-
Currency because the data gathering and publishing process is
automatic,
-
Flexibility because different perspectives of the same
information are available depending on the business view needed,
-
Insightful because complex data patterns emerge from
data mining
that otherwise go
unnoticed,
-
Less
subjective because conclusions are tied directly to the physical
data.
Data Cleansing
Also Known As:
Data Quality Assurance
Data Dictionary
The Data Dictionary is the catalog of data
elements within a database
system. Data Dictionaries
maintain the
metadata
needed to manage and define data elements.
Data Extraction
Data Extraction is the process of extracting
specific
source data to satisfy a
data loading or reporting needs. Typically,
Data Warehouse
data extraction occurs on
one or more source
OLTP
systems. Business rules and filtering applied during the data extraction
process minimize the volume of data transferred to the data warehouse or data
warehouse stage
area.
Data, Information, and Knowledge
Data is the reality that a computer records,
stores, and processes. The use of computers to manipulate, report, and analyze this reality is referred to
as data processing. Data only has meaning when combined with information that describes the data’s relevance
or application ( also
known as:
metadata).
For example, the raw numbers within a spreadsheet have no meaning
without the column and row headers defining a relationship.
Information is what a person is able to
understand or deduce about data based on the available metadata.
Knowledge is the result of combining information
with experience and is what a business uses to make decisions. The process of organizing information
in such a way as to create data-based knowledge is called Data Warehousing. Business Intelligence Tools
are software products that present knowledge to business users.
Every year the amount of raw data in the world
is approximately doubling. The goal of business intelligence and
data warehousing
is changing data into
knowledge and incorporating the knowledge back into a business decision making process.
Data Mart
A Data Mart is a database having the same
characteristics as a data warehouse, but focused on the data for one division or one business unit within an
enterprise. The Data Mart may precede or succeed the
Data Warehouse
depending on the architecture and implementation plan.
Typically, the data mart is the prototype or the
first step of a data warehousing process. An enterprise picks the division or group that would most benefit
from
data-based knowledge and
constructs a data mart to satisfy specific business needs. Proper planning
performed during the design incorporates attributes needed by other business requirements, thus
facilitating future integration. As business needs expand,
additional subject areas are added to the design and
eventually the data mart becomes a data warehouse.
Data Marts may also be created after a Data
Warehouse for a number of accessibility and performance reasons.
Data Migration
Data Migration is the act of moving data from
one environment to another. In
Data Warehousing,
migration occurs when information transfers from a legacy
system into a data warehouse.
Data Mining
Data Mining is the process of finding hidden
patterns and relationships in the data.
Analyzing data involves the recognition of significant patterns and
relationships. At a simple level, Business Analysts recognize relationships between common business objects
such as: customer, product, and time relating to a specific sales transaction. Business Analysts can
effectively see patterns in small data sets where viewing all
factors simultaneously is possible. As information
volumes grow, however, there is a need for complex analyses to identify patterns and relationships that are not
apparent. Data Mining tools interrogate information and identify relationships that may become very
strategic when planning campaigns or profiling behavior.
One of the primary advantages of Data Mining for
Data
Warehousing is that users
do not need to have a preexisting hypothesis about the information to analyze it.
Without Data Mining, Analysts must “look” for trends in information. With Data Mining, the
system “finds” patterns using techniques like “classification”, “association”, or “clustering”.
Data Model
A Data Model is a logical representation of data
element ( Also
Known As: Attribute)
groupings, relationships, and
meta data
independent of software or
hardware. Data Models may exist in a Logical or Physical form. A Logical Data Model defines a design in terms of
business processes and business relationships. A Physical Data Model represents the physical
implementation of the Logical design in the host RDBMS system.
Modeling tools such as ER Win or System
Architect allow designers to develop a data model that is compatible with multiple
RDBMSs.
An Entity within a data model consists of
logical groupings of attributes. Entities of a Physical Data
Model directly corresponds to a table or view within a
database system. Meta data for each attribute and entity for a data warehouse project should be contained
within the data model.
Data Quality Assurance
Also Known As:
Data Cleansing or Data Scrubbing
The process of checking the quality of the data
being imported into the data warehouse.
Data quality assurance is one of the greatest
challenges in the process of data warehousing. If the data-based knowledge generated by the data warehouse is to
be trusted, the data entered into the warehouse must be complete and accurate - "garbage in, garbage
out".
Data quality can be a challenge for several
reasons:
-
The
data is being consolidated from a variety of legacy sources that
may have differing definitions of key concepts such as "customer" or "profit".
-
The
legacy data was not originally collected for the purpose of
decision support so some of the key data might be missing, incomplete, or not as
accurate as desired.
-
There
might be times when all the data is not received from one of the
legacy systems. This could make comparisons between time periods invalid.
A significant portion of time in the development
process should be set aside for setting up the data quality
assurance process and implementing whatever data cleansing is
needed.
In a production environment, there should be a
data quality report generated after each data warehouse import. There should be provision for rolling
back an import if data quality testing indicates that the data
is unacceptable.
Data Scrubbing
Also Known As:
Data Quality Assurance
Data Transformation
Transformation is the modification of data
during the
data extraction
and loading of a
data warehouse.
This modification can include:
Data Warehouse
Also Known As:
Information Warehouse or
Warehouse
A Data Warehouse is a physical staging area for
decision-support information. It collects data from various applications in an organization's
operational
systems, integrates the
data into a logical
model
of business subject areas, stores the information in a
manner that is accessible and understandable to non-technical decision makers, and delivers information to
decision makers across the organization through various report writing and query tools.
The term “Data Warehouse” was coined by Bill
Inmon in 1990 as a database that is a subject-oriented, integrated, time-variant and non-volatile
collection of data to support a decision making process.
Specific definition of these terms is:
-
Subject-oriented - Data that gives information about a
particular subject instead of about a company's on-going operations,
-
Integrated - Data that is gathered into the data warehouse from
a variety of sources and merged into a coherent whole,
-
Time-variant - All data in the data warehouse is identified with
a particular time period,
-
Non-volatile - Data is stable in a data warehouse. Information
is added, but data is never removed.
This enables management to gain a consistent
picture of the business.
Data Warehousing
Data Warehousing is the process of visioning,
planning, building, using, managing, maintaining, and enhancing
data warehouses
and/or data marts. There
are many steps in the data warehousing process:
-
Visioning - Having an idea about what could be accomplished.
-
Learning - Studying the potential of data warehousing.
-
Justifying - Developing a business purpose for the process.
-
Budgeting - Counting the cost.
-
Deciding - Making a commitment to develop and use data-based
knowledge.
-
Gathering Information - Examining legacy systems.
-
Interviewing Users - Finding what information is needed.
-
Choosing Tools - Choosing the hardware, the database management
system, the data extraction tools, and the Business Intelligence tools..
-
Building, Using, Testing, and Evaluating the Prototype - Repeat
this step and the above steps as necessary.
-
Deploying - Putting the system into operation.
-
Training - Helping users make full use of the Business
Intelligence tools.
-
Managing - Keeping track of scheduled data replication, system
usage, and query performance.
-
Adding,
Modifying, On-Going Development - As the system is used, new
possibilities will be discovered. Consider also all the actions that take place as
a part of the data warehousing process.
-
Data
Replication - Periodic copying of legacy data.
-
Data
Transformation - Transforming the legacy data into the form in
which it will be stored in the data warehouse.
-
Data
Quality Assurance - Testing the data for inconsistencies and
errors.
-
Data
Storage - Storing the data in a
DBMS
(Database Management System).
-
Metadata Storage - Storing the description of the data - the
data about the data.
-
Data
Mart Population - Populating all the data marts that receive
their data from the warehouse.
-
Setting
Up Business Intelligence Tools - Giving users access to the data
through
multi-dimensional
analysis, querying, and data mining.
-
Setting
Alerts - Establishing conditions that result in an automatic
message being sent.
-
Data
Warehousing Management - Keeping track of how well all the other
actions are being carried out.
Database Management System (DBMS)
The DBMS is the software used to store, access,
and manage data. Examples of these software include Oracle, SQL Server, and
Sybase. There are two
main types of Database Management Systems used for business intelligence and data warehousing -
specialized Multi-Dimensional Database Management Systems (MDBMS) and the more widely used general purpose
Relational Database Management Systems (RDBMS).
Data Webhouse
A Data Webhouse is a data warehouse built
specifically for collecting and analyzing web page user information. The primary source of information
in a Webhouse is the web server logs. Within these logs is a plethora of information that defines every
action performed during a user session. When integrated with other subject areas of traditional
data warehouses,
Webhouses relate customer information or history to web site behavior.
DBMS
Also Known As:
Database
Management System
Decision Support System (DSS)
A DSS is a computer system designed to assist an
organization in making decisions. The data store behind a DSS is typically a
Data Warehouse
or
Data Mart,
however, some DSS do not require the capabilities found in a warehouse design.
De-Normalization
De-Normalization is reversing the application of
Normalization techniques
to information. Normalization enforces data integrity and minimizes storage
space usage. The downside to a normalized design is additional processing overhead incurred during query
processing to re-assemble the information before display or analysis. De-Normalizing a design eliminates
this overhead by joining frequently accessed information into the same table. In a
Data Warehouse,
dimensions
are very often
De-Normalized for query performance improvements.
Density
Opposite of:
Sparsity
Derived Data
Also Known As :
Calculated Measure
Data not directly supplied by source systems,
but calculated according to business rules. Derived data includes
aggregated data
and other data
transformed
from its original state.
Desktop OLAP (DOLAP)
Desktop OLAP is a form of
OLAP
tool where a Cube containing a
summary of multi-dimensional data exists on an individual user's computer.
Dimension
A Dimension is a component of a star schema
containing the criteria by which analyses are constrained. During data warehouse design, we group
attributes
according to relationship and
level of detail. These groupings form
hierarchies
and classifications by
which business measure are compared and summarized. Each physical grouping collectively is a
dimension.
Examples of a dimension found in most all data
warehouses is Calendar. A Calendar dimension contains a single record for every day within the domain of
a
data
warehouse. Each day has
hierarchies of month, year, quarter, season, day of week, fiscal period, and
day of month. Business measure, or
fact,
occurring on a single day group or summarize to any level hierarchy
found in the Calendar Dimension.
Most dimension tables in a
star schema
are intentionally
de-normalized.
This typically does not add a significant amount of storage space to the
database because the overall size of the dimension tables is
very small when compared to the size of the
fact table.
Dimensional Modeling
Dimensional Modeling is the process of taking
business requirements and transforming them into a
data model
specifically designed for analytical reporting. The term
Dimensional Modeling comes from one of the two primary components of a dimensional model,
data warehouse
dimensions.
The other primary component are
facts.
Drill Down, Drill Up, and Drill Across
Drilling though data is the ability to move
between levels of data
hierarchies
when viewing data within a
data warehouse.
-
Drill
Down - Changing the view of the data to a greater level of
detail.
-
Drill
Up - Changing the view of the data to a higher level of
aggregation.
-
Drill
Across – Changing the view to another dimensional value at the
same level as the current view.
Drilling down and drilling up allow an analyst
to move down and up levels of summary to see how the information at the various levels is related. A
powerful use of this capability is to view information at its highest level of summarization to identify
obvious trends in the information. To further investigate the
cause of a trend a user simply drills down one or more
levels to uncover the detail behind the summarized information
Dynamic Queries
Dynamic Queries are dynamically constructed
SQL
statements (Also
Known As:
Queries)
based on conditions derived from results of other queries. Desktop
query tools often use this approach to simplify the interface between the tool and the database. Dynamic
Queries
parse at run time.
back to top
E
End Users
Also Known As:
Business Users
End users are the business users that interface
to information through a front-end program or set of reports.
Enterprise Data Warehouse
Enterprise Data Warehouse is a term used to
describe a relational
data warehouse
integrating strategic
knowledge
from several departments within an enterprise. Often,
Enterprise-level Data Warehouse planning identifies
subject areas
and clearly defines
monetary priority for implementation. The Enterprise Data Warehouse implementation occurs as development
of integrated
Data Mart
components.
Enterprise Information System / Executive
Information System (EIS)
Also Known As:
Digital Dashboard
An EIS is a computer system or user-interface
that presents a summary of a company's important data on a routine basis.
Entity or Entity Relationship Diagram (ERD)
See
Data Model
Extract Transform and Load (ETL)
ETL is the complete data movement process from
source systems to the production
data warehouse. Components of ETL
are Data
Extraction,
information consolidation or
cleansing, and loading.
Extract Specification
The Extract Specification is the standard
defined interface of information between
source
systems and the
data warehouse.
This specification lists the individual attributes and sources
of data elements needed to maintain production loads within the data
warehouse system. Attributes are grouped by their business characteristics and required data loading
strategy.
back to top
F
Fact
Facts are data elements associated with business
activity or business performance measurements ( Also
Known As:
Key Performance Indicators or KPI). Typically, facts are
quantitative values such as gross sales dollars or costs dollars. Cube
cells
contain fact values.
Fact Group
A fact group is a group of
fact tables
that collectively make up
a
subject
area.
Fact Table
In a star schema, the central table containing
Facts
has foreign-key
relationships to all
dimension
that are relevant to the fact measurement. There are two
types of fields in a fact table:
-
Fields
storing the foreign keys that connect each particular fact to
the appropriate dimension value,
-
Fields
storing the individual
facts.
back to top
G
Granularity
Granularity is the level of detail of
facts
stored in a
data warehouse.
The granularity of the fact table is one of the most significant design decisions in
creating a data warehouse. Facts should be at an
aggregation
level low enough to support the lowest level of
analysis. For example, if data warehouse requirements state it
is necessary to summarize product SKU
sales, the fact table granularity
must be at an invoice line-item level, not the invoice ticket level.
back to top
H
Hash
Hashing is a mathematical method of random
distribution. In databases,
hashing mathematically assigns one of a fixed number of codes to an attribute value
in an attempt to evenly distribute values across multiple data
partitions.
Hashing is not effective when a data set has a large volume with
low Cardinality.
Hierarchy
A hierarchy is a relationship between attributes
in a table consisting of sets and sub-sets. Hierarchies allow traversing of
aggregation
levels within data by
querying
with
SQL
using a GROUP BY clause. Moving between the levels of a hierarchy is called
drilling
up and drilling down.
Hybrid OLAP (HOLAP)
HOLAP is a hybrid architecture that combines the
advantages of Relational OLAP ( ROLAP)
and Multi-Dimensional OLAP (MOLAP)
architectures. In HOLAP architecture, aggregated levels of
hierarchies are stored in MOLAP
cubes
for
query
efficiency. Detail information
(data at the lowest level of
aggregation)
that would be too cumbersome to store in cubes
resides in ROLAP architecture in the production
data warehouse.
HOLAP provides connectivity to the large data sets in the
relational database while taking advantage of the faster performance of the
multi-dimensional aggregation storage.
HyperCube
Also Know As:
Cube
back to top
I
Index
An Index is a database structure enabling rapid
table searching by one or more column criteria. Indexing subsets of columns in a table enabling queries
to access only the necessary database blocks, thus reducing disk access (and time). A query processor utilizes an
index only if doing so reduces the number of disk reads needed to return the query results.
Two common types of indexing strategies used by
RDBMSs
are Btree and Bitmap.
Btree indexes are tree-like data structures sorted by the index key field(s).
Theoretically, the tree structure has many branches to minimize the number of comparisons necessary to
reach a leaf node. Bitmap indexes are effective for columns with low
cardinality
distribution of values. A
bitmapped index contains a record for each record in the table. The bit count in a bitmap index
record matches the number of distinct values in the column(s) indexed (plus overhead). Binary comparisons of
bitmaps make searches extremely fast.
back to top
L
Legacy System
See Also:
Source Database
A Legacy System is an established computer
system that is an integral part of a company’s daily business operations. One of the biggest challenges of the
data
warehousing process is to
extract and consolidate data from a variety of legacy systems through out an
enterprise.
back to top
M
Materialized Views
Also Known As:
Aggregate Navigation
A Materialized view is an Oracle
RDBMS
feature that provides
summary-level information using a database snapshot. Once created, Materialized Views use
Query
Redirection to
transparently process a
query
request in the most efficient manner.
Measure
Also Known As:
Fact
Meta Data
Meta Data is information describing the
application, constraints, source, transformation, meaning,
units, relationships, and relevance of data in the
warehouse.
A common definition of Meta Data is that it is data about data. In Data Warehousing, Meta Data is
any information about a database
attribute
needed to support and manage the operation of the data warehouse.
Middleware
Middleware is a communications layer allowing
applications to interact across hardware and network environments.
Multi-Dimensional Analysis
Also Known As:
OLAP (On-Line Analytical
Processing)
Multi-Dimensional Cube
Also Known As:
Cube
Multi-Dimensional Database Management System (MDBMS)
An MDBMS is a database management system
designed specifically to support efficient storage and retrieval of
OLAP
data. The MDBMS interfaces with
or contains a user-interface that supports
drilling through hierarchies
in the data. The structure where
an MDBMS stores information is a
cube.
Multi-Dimensional On-Line Analytical Programming
(MOLAP)
Also Known As:
OLAP
back to top
N
Natural Key
A Natural Key is a primary key within a
source
system. Natural keys cannot
always be primary keys in a data warehouse dimension because multiple versions of
a record may need to be stored.
Non-Volatile
Non-Volatile data is data that does not change over time.
Most all data in a
data warehouse
is non-volatile. Since data warehouses record and reconstruct
history, changes to existing data are rare. Non-volatility is
one of the original defining characteristics of a
data warehouse.
Normalization
Normalization is the process of organizing data to eliminate redundancy.
In
OLTP systems, normalization enforces
referential
integrity, minimizes
storage requirements, and optimizes the data insertion
performance.
The downside of normalization is the increased
overhead required to assemble data sets at
query
time. An example of a normalized design is a contact
database where addresses are stored in a separate table from contact name. Two contacts with the same address
would join to a single address record rather than the address information duplicated in the same table
as the contact name.
Normalization exists in
Data Warehouse Fact Tables.
Dimension
Tables in a data warehouse
de-normalized information to increase query performance.
back to top
O
OLAP (On-Line Analytical Processing)
The term OLAP distinguishes
data warehousing
activities from On-Line
Transaction Processing (OLTP) business activities. In its broadest usage, OLAP
is a synonymous to data warehousing. In a narrow usage, OLAP refers to the tools used for
Multi-Dimensional Analysis.
The OLAP process organizes and
aggregates
data into levels of business
information (facts)
and reporting these levels by business classifications (dimensions).
A spreadsheet is a two-dimensional analysis tool. OLAP provides spreadsheet-like analysis
simultaneously spanning as many dimensions as the business needs
to support. Several software packages facilitate
manipulating and displaying this information that is too complex to represent in two-dimensions.
OLTP (On-Line Transaction Processing)
OLTP is a collective term encompassing the
operations and systems that process transactions as part of a company’s daily business activities. OLTP
systems ( Also
Known As:
Legacy)
are the information sources for data warehouses.
Operational Data Store (ODS)
An ODS is an integrated database of operational
data. ODS sources may include
OLTP
systems or any system containing current or near-term data. One
qualifier of an ODS is data volatility (frequent updates) and the absence of historical information.
An ODS is a resource for operational reporting.
Often, an ODS is created to consolidate information from multiple sources into a repository that is used
only for short-term reporting. Once consolidated, the information in an ODS can be used as a source
for a
data
warehouse system.
Operational System
Also Known As:
OLTP
system.
back to top
P
Parse
Parsing is the act of translating commands into
a language native to an interfacing system. In
SQL,
English commands are parsed into the host
RDBMS
by an interpreter. The
query
is translated, appropriate information retrieved, and results are returned
to the user.
Partition
A partition is a physical segmentation of
information. Partitions organize information into smaller, manageable segments primarily for performance
reasons.
Pivot Table
A
table
used to summarize data
and rotate rows and column titles
to obtain different views of the data. Rows can be “layered” to reflect the different levels
of data that can be viewed.
Primary Key
A primary key is a
column
or columns of a data set
containing as many distinct values as there are
rows. Most
tables
in a relational database contain
a single primary key.
back to top
Q
Query
A Query is a request for information asked of a
RDBMS.
The result of a query is typically a single
row
or set of rows from a
table.
Query Redirection
Query Redirection is an Oracle feature that
automatically changes a user’s
SQL
statement to increase efficiency. This operation is done when the
query optimizer determines that a
materialized view
or other
table can
process a query request with less overhead than the requested
table. A practical example of query redirection is a user query that
aggregates
a base table and groups by a
particular
column.
If a materialized view exists for the same table that has the same
query summarized already, the query optimizer accesses the materialized view instead of the base table.
back to top
R
RAID Disk Management
Redundant Array of Inexpensive Disks (RAID) is a
storage technology that allows virtual disk management of commodity disk drives. RAID technology
provides performance and/or reliability with the flexibility to utilize disks in a manner that supports business
and computational needs. The various configurations of RAID each have specific benefits and
liabilities.
-
RAID 0
– Provides non-redundant disk striping. Each disk is divided
into smaller units (stripes) and stripes from multiple disks are concatenated
into logical volumes. The benefit of RAID 0 is Input/Output is dispersed across multiple disks, thus
removing the chance of performance problems due to disk saturation. Sequential read and write
operations are performed well, but random read operations may be slower. The liability of RAID 0 is no
fault tolerance.
-
RAID 1
– Provides exact mirroring of one disk to another for fault
tolerance. This approach allows one disk to fail while the data integrity
remains intact. The benefits of RAID 1 are read performance (since smart controllers can retrieve the data
sectors on the disk closest to the present head location) and complete redundancy. The liability of RAID 1
is the cost to implement because twice the physical disks are needed.
-
RAID 01
or 10 – RAID 10 is a combination of RAID 0 and RAID 1 (striping
with mirroring). This approach does not use any more disks than RAID
1, but provides performance gains with fault tolerance.
-
RAID 2
– No longer used, technology was improved and replaced by RAID
3.
-
RAID 3
– Provides RAID 0 striping with redundancy via parity checking
instead of mirroring. The striping is performed at the byte-level. In RAID
3, a single disk is dedicated to storing parity information. If any single disk fails, the data
can be reconstructed from the remaining disk using an XOR
(exclusive OR) operation. Besides the redundancy, benefits of
RAID 3 include superior performance doing random read operations. The
liability of this approach is the additional write overhead to maintain the parity disk.
-
RAID 4
– Similar to RAID 3 except that the striping is performed at the
block-level.
-
RAID 5
– Combines RAID 0 and RAID 4 techniques to provide block
striping with redundancy via parity checking but instead of having a
dedicated parity disk, the parity may exist on any of the disks in the stripe. The benefit of this approach is
disk economy (as opposed to full mirroring) with security. The liability of this approach is the
additional write overhead (especially for sequential write operations) to maintain the parity, but this
liability is spread across all the disks in the stripe.
-
RAID 6
– Extends RAID 5 to contain two distributed parity disks. Having
two parity disks allows for dual disk failure with no loss of data
integrity. The liability of this approach is that it takes three writes to save one block of information.
-
RAID 7
or RAID S – Hardware implementations of RAID 5 with improved
performance.
Referential Integrity (RI)
Referential Integrity is the process of ensuring
that relationships between database objects are valid and remain valid. Enforcement of RI rules occurs
automatically by the
RDBMS
during key field insertion, update, or delete.
Relational Database Management System (RDBMS)
|