Materialized views are a data warehousing/decision
support system tool that can increase by many orders of magnitude the speed of queries
that access a large number of records. In basic terms, they allow a user to query
potentially terabytes of detail data in seconds. They accomplish this by transparently
using pre-computed summarizations and joins of data. These pre-computed summaries would
typically be very small compared to the original source data.
In this article you'll find out what materialized
views are, what they can do and, most importantly, how they work - a lot of the ' magic '
goes on behind the scenes. Having gone to the trouble of creating it, you'll find out how
to make sure that your materialized view is used by all queries to which the view is
capable of providing the answer. Sometimes, you know Oracle could use the materialized
view, but it is not able to do so simply because it lacks important
information.
There is one mandatory INIT.ORA parameter
necessary for materialized views to function, this is the COMPATIBLE parameter.
The value of COMPATIBLE should be set to 8.1.0, or above, in order for query rewrites to
be functional. If this value is not set appropriately, query rewrite will not be
invoked.
There are two other relevant parameters that may be
set at either the system-level via the INIT.ORA file, or the session-level via the ALTER
SESSION command.
Unless the value of this parameter is set to TRUE,
query rewrites will not take place. The default value is FALSE.
This parameter controls how Oracle rewrites
queries and may
be set to one of three values:
ENFORCED - Queries will be rewritten using
only constraints and rules that are enforced and guaranteed
by Oracle. There are mechanisms by which we can tell Oracle about other
inferred relationships, and this would allow for more queries to be
rewritten, but since Oracle does not enforce those
relationships, it would not make use of these facts at this level.
TRUSTED - Queries will be rewritten using
the constraints that are enforced by Oracle, as well as any
relationships existing in the data that we have told Oracle about, but are not
enforced by the database.
STALE TOLERATED - Queries will be rewritten
to use materialized views even if Oracle knows the data
contained in the materialized view is ' stale
' (out-of-sync with the details). This might be useful in an environment where the summary tables are refreshed on a
recurring basis, not on commit, and a slightly out-of-sync
answer is acceptable.
The needed privileges are as follows:
- CREATE SESSION
- CREATE TABLE
- CREATE MATERIALIZED VIEW
- QUERY REWRITE
Finally, you must be using the Cost Based Optimizer
CBO in order to make use of query rewrite. If you do not use the CBO, query rewrite will
not take place.
The example will demonstrate what a materialized
view entails. The concept is that of reducing the execution time of a long running query
transparently, by summarizing data in the database. A query against a large table will be
transparently rewritten into a query against a very small table, without any loss of
accuracy in the answer. For the example we create our own big table based on the system
view ALL_OBJECTS.
Prepare the large table BIGTAB:
sqlplus scott/tiger
set echo on
set termout off
drop table bigtab;
create table bigtab
nologging
as
select * from all_objects
union all
select * from all_objects
union all
select * from all_objects
/
insert /*+ APPEND */ into bigtab
select * from bigtab;
commit;
insert /*+ APPEND */ into bigtab
select * from bigtab;
commit;
insert /*+ APPEND */ into bigtab
select * from bigtab;
commit;
analyze table bigtab compute statistics;
select count(*) from bigtab;
COUNT(*)
----------
708456
Run query against this BIGTABLE
Initially this quewry will require a full scan of
the large table.
set autotrace on
set timing on
select owner, count(*) from bigtab group by owner;
OWNER
COUNT(*)
------------------------------ ----------
CTXSYS
6264
ELAN
1272
HR
816
MDSYS
5640
ODM
9768
ODM_MTR
288
OE
2064
OLAPSYS
10632
ORDPLUGINS
696
ORDSYS
23232
OUTLN
168
PM
216
PUBLIC
278184
QS
984
QS_ADM
168
QS_CBADM
576
QS_CS
552
QS_ES
936
QS_OS
936
QS_WS
936
SCOTT
264
SH
4176
SYS
324048
SYSTEM
15096
TEST
4536
WKSYS
6696
WMSYS
3072
XDB
6240
28 rows selected.
Elapsed: 00:00:07.06
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
(Cost=2719 Card=28 Bytes=140)
1 0 SORT (GROUP BY) (Cost=2719 Card=28
Bytes=140)
2 1 TABLE ACCESS (FULL) OF
'BIGTAB'
(Cost=1226 Card=708456 Bytes=3542280)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
19815 consistent gets
18443 physical reads
0 redo size
973 bytes sent via SQL*Net to client
510 bytes received via SQL*Net from
client
3 SQL*Net roundtrips to/from
client
1 sorts (memory)
0 sorts (disk)
28 rows processed
In order to get the aggregate count, we must count
700'000+ records on over 19800 blocks. If you need this summary often per day, you can
avoid counting the details each and every time by creating a materialized view of this
summary data.
Create the Materialized View
sqlplus scott/tiger
grant query rewrite to scott;
alter session set query_rewrite_enabled=true;
alter session set query_rewrite_integrity=enforced;
create materialized view mv_bigtab
build immediate
refresh on commit
enable query rewrite
as
select owner, count(*)
from bigtab
group by owner
/
analyze table mv_bigtab compute statistics;
Basically, what we've done is pre-calculate the object count, and define this
summary information as a materialized view. We have asked that
the view be immediately built and populated with data. You'll
notice that we have also specified REFRESH ON COMMIT and ENABLE
QUERY REWRITE. Also notice that we may have created a materialized view, but when we
ANALYZE, we are analyzing a table. A materialized view creates a real table, and
this table may be indexed, analyzed, and so on.
Now let's see the materialized view in
action by issuing the same query again
set timing on
set autotrace traceonly
select owner, count(*)
from bigtab
group by owner;
set autotrace off
set timing off
28 rows selected.
Elapsed: 00:00:00.03
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
(Cost=2
Card=28 Bytes=252)
1 0 TABLE ACCESS (FULL) OF 'MV_BIGTAB'
(Cost=2 Card=28 Bytes=252)
Statistics
----------------------------------------------------------
11 recursive calls
0 db block gets
17 consistent gets
0 physical reads
0 redo size
973 bytes sent via SQL*Net to client
510 bytes received via SQL*Net from
client
3 SQL*Net roundtrips to/from
client
4 sorts (memory)
0 sorts (disk)
28 rows processed
No physical I/O this time around as the data
was found in the cache. Our buffer cache will be much more efficient now as it has less
to cache. W could not even begin to cache the previous query's
working set, but now I can. Notice how our query plan shows we are now doing a full scan
of the MV_BIGTAB table, even though we queried the detail table BIGTAB. When the SELECT
OWNER, ... query is issued, the database automatically directs it to the materialized view.
Now, add a new row to the BIGTAB table and commit te
change
insert into bigtab
(owner, object_name, object_type, object_id)
values ('Martin', 'Zahn', 'Akadia', 1111111);
commit;
set timing on
set autotrace traceonly
select owner, count(*)
from bigtab
where owner = 'Martin'
group by owner;
set autotrace off
set timing off
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
(Cost=2
Card=1 Bytes=9)
1 0 TABLE ACCESS (FULL) OF 'MV_BIGTAB'
(Cost=2 Card=1 Bytes=9)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
439 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from
client
2 SQL*Net roundtrips to/from
client
0 sorts (memory)
0 sorts (disk)
1 rows processed
The analysis shows that we scanned the materialized
view MV_BIGTAB and found the new row.
By specifying REFRESH ON COMMIT in our original definition of the view, we
requested that Oracle maintain synchronization between the view and the details, the
summary will be maintained as well.
This is relatively straightforward and is answered
in a single word - performance. By calculating
the answers to the really hard questions up front (and once
only), we will greatly reduce the load on the
machine, We will experience:
-
Less physical reads - There is less data
to scan through.
-
Less writes - We will not be
sorting/aggregating as frequently.
-
Decreased CPU consumption - We will not
be calculating aggregates and functions on the data, as we
will have already done that.
-
Markedly faster response times - Our
queries will return incredibly quickly when a summary is
used, as opposed to the details. This will be a function of the amount of work we can
avoid by using the materialized view, but many orders of
magnitude is not out of the question.
Materialized views will increase your need for one
resource - more permanently allocated disk. We need extra storage space to accommodate
the materialized views, of course, but for the price of a little extra disk space, we can
reap a lot of benefit.
Materialized views work best in a read-only, or
read-intensive environment. They are not designed for use in a high-end OLTP environment.
They will add overhead to modifications performed on the base tables in order to capture
the changes.
There are concurrency issues with regards to rising
the REFRESH ON COMMIT option. Consider the summary example from before. Any rows that are
inserted or deleted from this table will have to update one of 28 rows in the summary
table in order to maintain the count in real time. This does not preclude the use of
materialized views in an OLTP environment. For example if you use full refreshes on a
recurring basis (during off-peak time) there will be no overhead added to the
modifications, and there would be no concurrency issues. This would allow you to report
on yesterday's activities, for example, and not query the live OLTP data
for reports.
Materialized views may appear to be hard to work
with at first. There will be cases where you create a materialized view, and you know
that the view holds the answer to a certain question but, for some reason, Oracle does
not. The more meta data provided, the more pieces of information about the underlying
data you can give to Oracle, the better.
So, now that we can create a materialized view and
show that it works, what are the steps Oracle will undertake to rewrite our queries?
Normally, when QUERY REWRITE ENABLED is set to FALSE, Oracle will take your SQL as is,
parse it, and optimize it. With query rewrites enabled, Oracle will insert an extra
step into this process. After parsing, Oracle will attempt to rewrite the query to
access some materialized view, instead of the actual table that it references. If it can
perform a query rewrite, the rewritten query (or queries) is parsed and then optimized along with the original query.
The query plan with the lowest cost from this set is chosen for execution. If it cannot
rewrite the query, the original parsed query is optimized and
executed as normal.
Summary table management, another term for
the materialized view, has actually been around for some time in tools such as Oracle
Discoverer. If you ran a query in SQL*PLUS, or from your Java JDBC client, then the query rewrite would not (could
not) take place. Furthermore, the synchronization between the details (original source
data) and the summaries could not be performed or validated for you automatically, since
the tool ran outside the database.
Furthermore, since version 7.0, the Oracle
database itself has actually implemented a feature with many of the characteristics of
summary tables - the Snapshot. This feature was initially designed to support
replication, but many would use it to ' pre-answer ' large queries. So, we would
have snapshots that did not use a database link to replicate data from database to
database, but rather just summarized or pre-joined frequently accessed data. This was
good, but without any query rewrite capability, it was still problematic. The application
had to know to use the summary tables in the first place, and this made the application
more complex to code and maintain. If we added a new summary then we would have to find
the code that could make use of it, and rewrite that code.
In Oracle 8.1.5 Oracle took the query rewriting
capabilities from tools like Discoverer, the automated refresh and scheduling mechanisms
from snapshots (that makes the summary tables ' self maintaining ' ), and combined these
with the optimizer's ability to find the best plan out of many alternatives. This
produced the materialized view.
With all of this functionality centralized in the
database, now every application can take advantage of the automated query rewrite
facility, regardless of whether access to the database is via SQL*PLUS, JDBC, ODBC,
Pro*C, OCI, or some third party tool. Every Oracle 8i enterprise database can have
summary table management. Also, since everything takes place inside the database, the
details can be easily synchronized with the summaries, or at least the database knows
when they aren't synchronized, and might bypass stale summaries.
|