Data sets, access methods and DFSMS

Data sets, access methods and DFSMS
Dave Betten

This presentation is intended to provide some basic information about data sets and access methods
This is by no means a complete tutorial Rather, I’ve tried to cover some basics and prepare for some follow on discussions and mentoring Some basics on the difference between data sets and access methods Explain some of the terminology you’ll probably hear used often Discuss some of the benefits of extended format data sets Types of compression, data striping, etc. Review some basic concepts of Systems Managed Storage (SMS) Explain some older technologies that might still be in use today Hiperbatch, VIO

It’s important to understand the difference between data sets and access methods
Let’s set VSAM aside for now There are two main types of non-VSAM data sets Physical sequential data sets which are commonly referred to as flat files Records are stored sequentially into the data set Can reside on disk or tape Partitioned data sets (PDS) A PDS has a directory that can point to multiple members Each member can be accessed individually much like you access physical sequential files Access methods are software interfaces used to access data sets BSAM – Basic Sequential Access Method QSAM – Queued Sequential Access Method BPAM – Basic Partitioned Access Method There are other very old access methods that are rarely used BDAM – Basic Direct Access Method ISAM – Indexed Sequential Access Methodf The same data set can be accessed via different access methods depending on the program Some low level programs build their own channel programs to access the data Commonly referred to as EXCP access method DFSORT is a common user of EXCP

Data set characteristics
Record formats (RECFM) Fixed (F) Fixed Blocked (FB) Variable Blocked (VB) Variable Spanned (VBS) Undefined length (U) Record length (LRECL) For F and FB formats it’s the length of the records. Every record has the same length For VB and VBS, it’s the maximum length The length of the records varies but non cannot exceed the maximum Each record begins with a 2 byte field that contains the length of the record U is normally used for things like load libraries Block size The size (in bytes) of each block of records

Terminology BUFFER BLOCK I/O REQUEST EXCP
An area in a program's virtual storage that holds the data from one physical block of the data set BLOCK A collection of contiguous records which is the smallest unit of transfer between a processor and an I/O subsystem. I/O REQUEST Actual command to the I/O subsystem to transfer one or more blocks of data. Blocks per I/O is determined by the access method, program, and JCL. EXCP Is actually an MVS macro instruction to EXecute a specified Channel Program. For QSAM and BSAM, EXCP count has come to mean the number of blocks transferred.

QSAM Uses z/OS standard GET and PUT macros BUFNO - Buffer Number
Data is processed at the RECORD level Access method handles blocking and unblocking of records Access method manages synchronization Access Method manages overlapping requests BUFNO - Buffer Number Number of buffers allocated for storing blocks of data. Specified in program's DCB or in JCL Default BUFNO is 5 Buffers allow multiple blocks to be accessed in a single I/O

BSAM Uses the z/OS standard READ and Write macros
Data processed at the BLOCK level Program must manage Blocking and buffering Must move records to and from blocks an maintain variable lengths Must provide buffer address for READ and WRITE macros Program is responsible for synchronizing requests and overlapping processing Program must issue CHECK, WAIT or EVENTS macro to determine if requested operation completed successfully Must have sufficient buffers available Some programs written to exploit whatever buffers available at run time Other programs only exploit a fixed number of buffers NCP - Number of Channel Programs Number of READ or WRITE requests a program may issue before a CHECK is issued Specified in program's DCB macro (some programs examine NCP or BUFNO in JCL). Default NCP is 1. Any more is program dependent.

Using standard access methods simplifies coding and maintenance
Much simpler coding logic Coding channel programs is not for the faint of heart!! Handles things like synchronization, recovery, etc. Less likely to require changes to user code DFSMS takes care of updating the access method code to support new technology and exploit new features Things like extended format, zHPF, compression, encryption, etc.

There are three types of non-VSAM data sets
Basic format Maximum of 59 volumes Limited to 64K tracks per volume Maximum of 16 extents per volume Large format Similar to Basic but can exceed 64K tracks per volume Extended format Extended Format relieves some of the limitations Logically the same format Stored differently on the hardware to exploit hardware and software facilities of SMS Must be SMS managed Enabled through allocation or data class parameter Allows some extra features: Compression Data Striping Extended Addressing (larger files) VSAM Allocation and Buffering

Virtual Storage Access Method (VSAM)
VSAM data sets only reside on disk and are only accessed via the VSAM access method They support three types of access Random (or direct) Sequential Skip Sequential VSAM functions consist of two major parts Catalog management – extensive information about VSAM data sets is stored in the catalog Record management – this part contains the access method code Data is logically stored in Control Intervals (CIs) and Control Areas (CAs) CIs are stored physically in blocks Control Areas contain multiple Control Intervals and are pointed to by index records Maximum size of a CA is one cylinder

There are four types of VSAM data sets
Key-Sequenced Data Sets (KSDS) Consist of an index and data component Records contain a key and data The index provides direct access to any record in the data component The data component can also be accessed sequentially Entry-Sequenced Data Sets (ESDS) Has no index Records are in the order they were added Can be accessed sequentially or direct using Relative Block Adress (RBA) Relative Record Data Sets (RRDS) Pre-formatted fixed length records Sequenced by relative number Records accessed by Relative Record Number (RRN) Allows direct and sequential access Linear Data Sets (LDS) Byte-addressable storage CI size is a multiple of 4096 Similar to a non-VSAM data set with some VSAM facilities Most common user is DB2

VSAM data sets can also be allocated as extended format
Similar benefits to non-VSAM Compression, striping, etc. There are additional benefits unique to VSAM Extended addressability allows a VSAM file to exceed 4GB in size Systems managed buffering improves performance And now encryption! Certain VSAM data sets cannot be extended format Catalogs System data sets (since DFSMS isn’t active yet at IPL time) Temporary data sets

Extended format - Compression
Compression is one of the main benefits of extended format There are three types or compression Initially there were only two: Generic: A standard dictionary is used as the basis for compression Tailored: Only used for non-VSAM, a tailored dictionary is created based on initial sampling of the data Usually provides better compression ratios zEDC compression addressed one of the main inhibitors to compression – CPU cost Both Generic and Tailored compression drive up CPU usage With zEDC, the compression is performed on a PCIE attached card, thus offloading the CPU cost Another benefit of zEDC is much improved compression ratios and reduced I/O Compressing data sets reduces disk storage usage but also improves performance Less data being transferred between the disk and the processor reduces I/O time

Extended Format - Data Striping: Implements parallel I/O to reduce elapsed times

Data sets must be DFSMS managed to be Extended Format
DFSMS allows for policy based management of data Automatic Class Selection (ACS) routines automate the assignment of data sets Data Class – data set allocation and space attributes This is where you can specify extended format, compression, extended addressability, etc. Storage Class - performance goals and availability and accessibility requirements This is where you can control the number of stripes Management Class - management attributes (retention, migration, backup, etc.) Storage Group – a collection of storage volumes and attributes The Interactive Storage Management Facility is used to define and manage the classes ISPF based application On the SYSD test system go to Option W and then Option I You can then list all of the various classes defined on the system ACS routines stored in OSPPGE.SMS.SOURCE We can look at these together in a future call

Most of the Data Class attributes can be overridden with JCL parameters
The DSNTYPE parameter is the most notable LIBRARY – Partitioned Data Set Extended (PDSE) PDS – Partitioned Data Set HFS – Hierarchical File System LARGE – Creates a large-format sequential EXTREQ – Extended format, required EXTPREF – Extended format, preferred BASIC – Basic format sequential Others might be space allocations, volume count, retention, etc.

So why aren’t more customers exploiting extended format
The Using Data Sets manual lists the restrictions for extended format sequential Restrictions: The following types of data sets cannot be allocated as extended-format sequential data sets: PDS, PDSE, and direct data sets, except VSAM Non-system-managed data sets VIO data sets. The following types of data sets should not be allocated as extended-format sequential data sets: System data sets GTF trace Data Facility Sort (DFSORT) work data sets Data sets used with Hiperbatch Data sets accessed with EXCP Data sets used with checkpoint/restart And the restrictions listed for VSAM include the data (IMBED parameter) or the data to be split into key ranges (KEYRANGES parameter). An open for improved control interval (ICI) processing is not permitted for extended format data sets.

Hiperbatch is an older technology we rarely see being used now
MVS function to retain data in storage Originally intended to exploit expanded storage but now backed by central storage Addresses two common characteristics of batch Multiple jobs requesting data from the same data set simultaneously. Jobs passing temporary or short-lived data sets to subsequent jobs. Uses the Data Lookaside Facility (DLF) to store data in a DLF object Supports QSAM and VSAM For VSAM KSDS, data component only Note that BSAM and EXCP not supported Does not require application or JCL changes

Hiperbatch Retain and Non-Retain
Intended for one writer followed by 1 or more readers Writer creates on DASD while copy placed in estor concurrently Entire file placed in DLF object Later readers retrieve from DLF object Later writers update DLF object and DASD concurrently DLF object must be explicitly deleted Retain requires enough storage to load entire file into memory Beneficial for files accessed concurrently by numerous jobs Non-Retain Intended for concurrent readers First reader retrieves from DASD Copy placed in DLF object concurrently Following readers retrieve from copy in storage Entire file need not fit in memory Storage stolen from just behind last reader DLF object deleted when open count reaches 0 Non-Retain has a much smaller storage requirement

VIO Allows temporary data sets to be buffered in storage
Eliminates all I/O to the data set No hardening of data on DASD Track window in address space Can be implemented vis DFSMS Define VIO storage group(s) Access controlled by VIOMAXSIZE parameter If primary + all secondaries > VIOMAXSIZE then no VIO Code ACS routines to direct data sets VIO storage group in list of candidate storage groups Transparent to users

DFSORT builds its own channel programs to access sort work data sets
The complex sort algorithms require DFSORT to directly access blocks of sort work data rather than write and read sequentially. A typical sort flow looks something like this: Read from SORTIN as much as we can fit in a Record Storage Area (RSA) allocated in the programs virtual storage Sort the data in RSA and write sorted string to SORT WORK Read another bunch of records from SORTIN into RSA Sort current bunch of records and write sorted string to SORT WORK Repeat steps 3 and 4 until end of SORTIN Merge sorted strings together and write sorted file to SORTOUT SORTWK01 DFSORT Job SORTWK02 SORTWK03 SORTIN SORTOUT SORTWK04

So we have to think about why more customers are not exploiting extended format
Yes there are restrictions like we’ve just reviewed But there are still many data sets that are eligible My experience is that many customers just don’t have the time to convert Requires analysis to make sure all access meets requirements Testing to verify nothing breaks Changes to JCL or DFSMS settings to do the conversion Compression was usually the main motivator for clients with extremely large files Even then, they implemented for a limited subset of the eligible data sets I haven’t seen striping used in a large number of environments That may be a result of faster channels and disk subsystems reducing the need What we need is a way to help with that analysis but also convince them it’s worth the time Compression and striping might provide savings to generate interest Pervasive encryption is certainly a game changer in motivating customers

So where do we go from here?
We can schedule some time to look at the WSC test system together Look at the current DFSMS settings Understand how to allocate extended format data sets As well as compression and striping Look at how we might provide SMF analysis to help customers identify eligible data sets Should probably review zBNA’s current capabilities I have another presentation that gives an overview of what SMF is and how I use it I’m happy to help as you come up with your own ideas

Data sets, access methods and DFSMS

Similar presentations

Presentation on theme: "Data sets, access methods and DFSMS"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data sets, access methods and DFSMS

Similar presentations

Presentation on theme: "Data sets, access methods and DFSMS"— Presentation transcript:

Similar presentations

About project

Feedback