Module 2: VSAM Concepts
VSAM Splits - CI Split and CA Split
- When VSAM dataset is created, the FREESPACE parameter is used to mention the amount(in percentage) of control interval to be left free during the load of VSAM file
- VSAM splits occurs as a result of data record insertion(or increasing length of an existing record) only in case of KSDS
- When new records is to be inserted (in Key Sequence) into Control Interval (CI) and suppose there is no free space available in the CI, the CI Split occurs. Approximately, half of the records in the filled CI are moved to any other free CI of same Control Area and then based on key sequence of that new record it is inserted into correct CI
- If there are no free Control Intervals exists in the Control Area (CA) for new record insertion, a CA Split occurs. Half the Control Intervals in the filled CA are moved to any other free Control Area. This way, this movement creates room for new CI and then new record is inserted as per key sequence
Let’s understand CI split and CA split with help of example
- In above figure, there are total 4 control Area, each control area has two control interval inside it.
- One sequence exist for each control area, thus there exist total 4 sequence sets. As explained in ‘Sequence Set’ concept, sequence set stores the highest key value from each intervals exist in same control area. Since in above figure, Control Area-1 has only Control Interval-1 filled, and if you see in Control Interval-1, highest key is ‘5’, thus sequence set is updated to hold ‘5’
- There are two level of index sets. The second level of index set contains pointer to sequence set. And top most level index set contains pointers to the second level index set
- CI Spilt: - When a record with key ‘4’ needs to be inserted, it should be physically stored in Control Interval-1 of Control Area-1 along with records 1 to 5. Now, since there is no space in Control Interval-1, CI split occurs. Records with key 1 and 3 will continue to exist in Control- Interval-1. And Records with key 4 and 5 will be moved to any of the free control interval. In our case Control Interval-2 is free, thus, they will moved there and sequence sets and index sets are updated accordingly.
- CA Split: - When a record with key ‘25’ needs to be inserted, it should be physically stored in CI- 6, now since there is no free space in CI-6, CI Split must be expected but in our case there is not any free CI available in Control Area-3 thus Control Area Split (CA Split) occurs. New control area is allocated and half the CIs of Control area-3 will be moved there and then sequence sets and index sets are updated accordingly
- How index could make the access faster? If you observe above figure, to access 40th record (having key 40 in CA-4) only 4 I/O operation will be needed. First I/O to read from Root Index, Second I/O to read from second level index set, Third I/O to read record location from Sequence set and in 4th I/O we will be able to actually read our record. However, if it was sequential file access then in that case it could have took 40 I/O operations. But on the other hand if we have to read First record, then in sequential access it could be read only at first I/O and via index access it will take 4 I/O operations.
- Spanned records are the records which are large records and cannot fit into single CI and thus span more than one CI
- However, records cannot span across control areas
- Free space left in the spanned CI is cannot be used by other record even though they logically fit in the unusable bytes
- Spanned records are possible only for KSDS and ESDS
- Cluster must be defined with SPANNED option to enable storing of spanned records