Sunday, October 10, 2010

Oracle Tuning Tip#10: External Table


TOPIC:
External Table

DEFINITION:

External Table is the another type of table which is supported by Oracle apart from IOT and Cluster table.

External tables allow Oracle to query data that is stored outside the database in flat files. The ORACLE_LOADER (an in-built driver provided by Oracle itself) can be used to access any data stored in any format that can be loaded by invoking SQL*Loader (an in-built utility provided by Oracle itself to load the data from the flat file into the data table).

External tables enable us to read flat-files (stored on the O/S) using SQL. They have been introduced in Oracle 9i as an alternative to SQL*Loader. External tables are essentially stored SQL*Loader control files, but because they are defined as tables, we can access our flat-file data using all available read-only SQL and PL/SQL operations. We can also read flat-files in parallel and join files to other files or tables, views and so on.
 
In this example, the data file (emp.txt) is placed in the location, “C:\test”. Then you have to create the oracle directory from the database that should point to this physical directory. So, now oracle read the data from the flat file through this Oracle Directory and displays the data. Oracle directory is a database object.

So, whenever SELECT statement is issued against the external table, Oracle reads the data from the data file through Oracle directory with the help from ORACLE_LOADER (in-built driver) and SQL*Loader (in-built utility). Unless SELECT statement is issued, Oracle doesn’t access this data file.

This is the reason for calling this type of table as “External Table” as data is stored outside the database and it will be retrieved from that physical directory only if SELECT statement is issued against that external table.

HOW TO VERIFY:
How to verify whether the created table is “External Table” or not? Fire this query,

select * from user_external_tables where table_name = 'EMP'
  • ð  This query will return 1 record if the created table is External Table

select location from user_external_locations where table_name = 'EMP'
  • ð  This query will return all the filenames which you are going to be used against this EMP table. (These filenames should be mentioned while creating the external table)
 LITTLE-KNOWN FACTS TO BE REMEMBERED:
  • ·         External table doesn’t consume any memory space which has been allocated for the Oracle database because it reads the data from the flat file and the flat file is placed outside the database.
  • ·         You can’t create index on the External Table since data is stored outside the database and ROWID become immaterial. With ROWID, index table can’t be created.
 ADVANTAGE:
  • ·         Though data is stored outside the database (especially in the flat file format[.txt or .csv]), this table can be selected, sorted, filtered, joined, intersected, minused, unioned and so on using SQL. Anything we can do in a SELECT statement with a "normal" table can be done with an external table. This makes working with external flat-files very simple
  • ·         Views and synonyms can be created against external tables
  • ·         They are useful in the ETL process of data warehouses since the data doesn't need to be staged.
 DISADVANTAGE:
  • ·         No DML activity can be performed on the external table since this is meant only for read-only purpose.
  • ·         Since index can’t be created against the external table, even searching for a single record (or list of records) would go for table full scan (actually, it is a full file scan).
  • ·         Oracle directory can be created only for the local physical directories but not for any network directories or any remote directories.
EXAMPLE:
Step1: First we have to prepare the data file either in .txt format or in .csv format. So, data file’s content should be like this, (here, it is “emp.txt”)
1,arun,admin,10000
2,shiva,hr,20000
3,deepa,sales,23000
4,jeeva,hr,30000
5,james,sales,35000
6,arju,admin,40000

Step2: In the database, from SYS user (or any user having the privilege to create oracle directory), create an oracle directory which points to the physical directory where this data file is placed,
Create directory testdir as ‘c:\test’;  
As per this example, the data file is going to be placed in “c:\test” directory.

This works only for the windows directory.
For unix, same statement should be like this,
Create directory testdir as ‘home/usr/test’;

Step3: we have to assign the privileges to db users to use this oracle directory. It will be sufficient if only read & write accesses are given. To achieve this,
Grant read,write on directory testdir to scott;

With this statement, Scott user has got both the read and write privilege on this oracle directory.
Please note, the physical directory should not be read-only and it should be accessible to the db users else you can’t create this oracle directory.

Step4: once all this setup is done, we can create the external table. This is the statement to create the external table,
CREATE TABLE emp (
Empid  number,
Empname varchar2(20),
Deptname varchar2(20),
Salary number
)
ORGANIZATION EXTERNAL
(
  TYPE ORACLE_LOADER
  DEFAULT DIRECTORY testdir
  ACCESS PARAMETERS
 (
  RECORDS DELIMITED BY NEWLINE
  FIELDS TERMINATED BY ','
  BADFILE 'emp.bad'
  LOGFILE 'emp.log'
  DISCARDFILE ‘emp.dis’
  (
      Empid,empname,deptname,salary
  )
 )
  LOCATION (‘emp.txt')
)
REJECT LIMIT UNLIMITED;

All the keywords used in the external table creation are explained below,
ORGANIZATION EXTERNAL      è         This clause tells Oracle that we are creating an external table;
TYPE ORACLE_LOADER          è         This is the in-built driver, this acts as an interface through which Oracle can interact with anyone outside the database
DEFAULT DIRECTOTY              è         This tells Oracle that all the files which are going to be referred will reside in this named directory (Here, it is “testdir”).
ACCESS PARAMETERS           è         This clause contains the SQL*Loader-style reference to enable Oracle to parse the flat-file into rows and columns. At this time, external tables do not offer the extensive range of parse options that SQL*Loader provides, yet still cater for most loading requirements. Note that if we have made any syntactical errors in our ACCESS PARAMETERS, we can still create the external table. Access parameters themselves are not parsed until we issue a SELECT against the external table.
LOCATION                                è         This clause tells what the name of data file is. (here, it is “emp.txt”). While creating the table, there is no need to place the data file. But before we issue a SELECT Statement against this external table, we should place the file.
BADFILE                                  è         This file contains the rejected records and will be written in the same physical directory where the data file is placed. This file(“emp.bad”) will be created while executing that SELECT statement.
LOGFILE                                   è         This file contains the information on how the input files were parsed, the positions and error messages for any rejected records and some other general information on  the load such as the number of row successfully read and will be written in the same physical directory where the data file is placed. This file(“emp.log”) will be created while executing that SELECT statement.
DISCARDFILE                           è         This file contains the records that failed the LOAD WHEN clause and will be written in the same physical directory where the data file is placed. This file(“emp.dis”) will be created while executing that SELECT statement.

Step5: Now, against this external table, you can write any sql that can be understood in this table.
  • ·         Select * from emp where deptname = ‘sales’
  • ·         Select empname from emp where eid = 1 UNION select empname from emp where eid = 3
  • ·         Select count(*) from emp

Oracle Tuning Tip#9: Cluster Table


TOPIC:
Cluster Table

DEFINITION:
Cluster Table is the another type of table which is supported by Oracle.

A cluster is a schema object that contains one or more tables that all have one or more columns in common. Rows of one or more tables that share the same value in these common columns are physically stored together within the database. A cluster is to store data for more than 1 table in the same block.

Tables which are created upon this cluster are called as “Cluster Table”.

Cluster Index is an index which is created for this cluster.

HOW TO VERIFY:
How to verify whether the created table is “Cluster Table” or not? Fire this query,

Select distinct cluster_name from user_all_tables where table_name IN ('EMPLOYEE',’DEPARTAMENT’)
  • ð  This query would return the name of the cluster (eg., query returns “CLUSTER_DEPT_ID” if it is the created cluster’s name)
LITTLE-KNOWN FACTS TO BE REMEMBERED:
  • ·         Clustered Index doesn’t occupy any memory space though it is created. In this type only, index table is part of data table but deals only with the way data is being stored. It means, functionality of the clustered index in the cluster is, it guides/or force data table to store each distinct cluster key value’s related data from both the tables in the same block
  • ·         If Cluster tables are referred in a query, then in the explain plan, you never see the step, ‘TABLE ACCESS BY ROWID” instead, you would see only “TABLE ACCESS CLUSTER
  • ·         you cannot load data into any clustered tables until you create the cluster index else any DML statement would throw the run time errors
  • ·         If you drop a existing cluster index, data in the cluster (including those cluster tables) remains but becomes unavailable until you create a new cluster index
  • ·         Clusters are of two types, indexed cluster & hast cluster. (in the below eg, I have explained the former)
  • ·         Value of the cluster key is stored only once in the block (this has been explained in the below eg.)

ADVANTAGE:
  • ·         Retrival of data from Cluster Tables would be much faster since the all the related records for each distinct index values are stored adjacent to each other physically.
  • ·         Logical I/O would be much faster since only the less number of blocks have to be referred for a query.
  • ·         Physical I/O would be much faster since only the less number of blocks to be read from the datafile.
  • ·         Amount of memory space required for the storage is reduced since value of the cluster key is only stored once.

DISADVANTAGE:
  • ·         DML activity would consume a lot of time since it couldn’t blindly insert the data in a new block. All the new/or updated record has to find out the cluster key first, traverse to that particular block and then do the respective activity.

EXAMPLE:

Create an EMP table(columns are empid,empname,salary,deptid) and inserts 16 records. Create a DEPT table(columns are deptid,deptname) and insertes 4 records. The data tables will look like this.
 
After doing this, Fire this query against these tables where the requirement is to display the all the employee names working for the deptid:3 along with the department names.
select a.empname,b.deptname from emp a, dept b
where a.deptid = b.deptid and b.deptid = 3;

To execute this query, Oracle has to fetch all the blocks since all the blocks contain an Employee belonging to deptid:3. So, it has to fetch all the blocks of EMP table. So, the traffic would be very high. In order to reduce this traffic, Cluster has to be created.

This is the syntax to create the cluster table,

Creation of “Cluster”:
CREATE CLUSTER cluster_dept_id (
department_id number) SIZE 1024;

Creation of “Clustered Index”:
CREATE INDEX indx_dept_id ON CLUSTER cluster_dept_id;

Creation of “Clustered Tables”:
create table emp
(empid number,
 empname varchar2(100),
 salary number,
 deptid number)
 CLUSTER cluster_dept_id(deptid);

create table dept
(deptid number,
 deptname varchar2(100))
 CLUSTER cluster_dept_id(deptid);

After this, the cluster, clustered index and the cluster tables will be similar to like this,
 
After creating this cluster, fire the same query again,

Now, block3 contains all the required EMP and DEPT records since all the related records of each index values would be stored adjacent physically. Because of this, oracle has to fetch only one block (block3). So, query would return the result set very fast.

Now, explain table will look like this,
SELECT STATEMENT

Cardinality=4
…………….


  TABLE ACCESS CLUSTER
Object name=DEPT
Cardinality=1
   INDEX UNIQUE SCAN
Object name=INDX_DEPT_ID
Cardinality=1
……………………


   TABLE ACCESS CLUSTER
Object name=EMP
Cardinality=4
    INDEX UNIQUE SCAN
Object name=INDX_DEPT_ID
Cardinality=1

If you closely look this explain plan, you can see that “TABLE ACCESS BY ROWID” is not mentioned and only “TABLE ACCESS CLUSTER” is mentioned. In both the cases, cardinality of the index (INDX_DEPT_ID) is 1. So, all the related records of both the tables are grouped together for each distinct cluster values. For DEPT table, explain table has mentioned Cardinality as 1 it means only one department record(deptid:3) is referred. For EMP table, explain table has mentioned Cardinality as 4 it means only four employee records(empid:103,107,111,115) are referred.