On datasets naming-scheme for batch processing

Jambock · October 6, 2023, 9:37pm

Sure DBMSs are invaluable, perhaps specially in OLTP and OLAP. Yet I think that BATCH processing is still much valuable as well in many under the hood workings and thus where well crafted COBOL batch programs continue to shine in terms of performance and overall convenience.

Although batch COBOL programs would include those with embedded SQL for querying DBMSs (typically DB2), the legacy processing of z/OS datasets (either QSAM or VSAM) is still essential. Unfortunately, the dataset naming rules (at most 44 characters on dotted-parts of at most 8 characters from [A-Z], [0-9] and [$,@,#]; not beginning by a number; the dots count) are rather limiting and demands (perhaps specially with QSAM) some sort of “ingenious” strategy.

It seems natural that most datasets names are to following strict guidelines governed by local policies and projects requirements, but my point are the “gaps” not covered by them. When a dataset naming scheme is left to an application programmer with not enough time to delve into it I have the impression that “maintenance issues” are on their way. I should provide an example of what I’m trying to say, but I’ll leave it for a next posting.

Thus, IMHO, a good dataset naming scheme strategy is paramount.

aenator · October 14, 2023, 4:48am

TOPIC: BATCH VS. OLTP

I like BATCH processing. I do not like “instant-update” [OLTP] environments, for large, distributed systems.

EXAMPLE

Just take a look at what happened to a human being of my own acquaintance, while using a specific social-media app that creates an interactive/[supposedly-]immediate-update environment that “cascades” its updates, world-wide, immediately:

The user sees the database-update “echoed”, to that user’s own device; and so,
the user “thinks” that the update is visible to ALL users, throughout the system; when, in fact,
there seems to be no way for the user to really know whether all of the “mirror-site” servers are fully-synchronized with one another, at any, given moment, or not. So,…
“content” appeared to be “there”, one minute, and “gone”, a short time later. But, in fact,…
the problem seems to be, that two users, both of whom were looking at the same dataset (probably, on “mirrored” copies that were resident on different servers), WERE SEEING DIFFERENT “CONTENT”.
The impact of that “anomaly” (in the context of other “anomalies” that were brought to light, at that time, which were of a “this-CANNOT-be-happening” nature) was, that an “interpersonal-relationship ‘train-wreck’” occurred, with certain of those relationships becoming “casualties”. In other words, “There was ‘wreckage’.”
Wreckage.

=============
END-EXAMPLE

CAVEAT: I have VERY LITTLE programming experience with an IBM mainframe; Mostly, I programmed on other vendors’ computers; on an IBM mainframe, I was a user of an existing system.

With that said,…

…I like BATCH processing, because --assuming nobody has bribed a tech, to get “advance notice” of the updated data, before the “production” database is unlocked, post-update/synchronization-- the users can have a reasonable expectation that they are all seeing the same thing.

Because, if you’ve got some kind of rotating roster of identical “spiders”, constantly running, one after the other, with each doing an audit and comparison of all the “mirrors” and “stand-alone components”, and signing an “all-clear” log, at the end,…

…and if any one of those “spiders” can shut the whole thing down, and lock the users out, if it detects an anomaly,…

…things can be reasonably safe, reasonably fair-and-equitable.

But, it’d be expensive, to do it that way. And not invulnerable, either, if those “spiders” should be set up in a way that has them “make their rounds” with a specific “interval” in-between, that’d let someone slide in and do hanky-panky, and get out, before the next “spider” came through.

END-TOPIC: BATCH VS. OLTP

TOPIC: NAMING-CONVENTIONS

The entire topic of “Naming-Conventions” is a tiny subset of the corporate IMPERATIVE that the Company be able to survive the departure of ANY of it’s employees.

There is nothing that will kill a career quicker than being “the only one who can <enter the name of ANY kind of task, here>”.

If programs are “spaghetti”, with variable-names being “standardized” to 3 characters in length, while paragraph-names are 5 characters long, whether they need it or not, the supervisor is probably planning on spending some time with their family.

And, if you can’t find the data-set a “production” program needs (but, you KNOW it’s there, SOMEWHERE)? Not good.

Working in an environment in which data-set names are meaningful, standardized, and MULTITUDINOUS is probably a challenge, for someone who is young and energetic, creative, ambitious, and eager to prove that person’s own worth, not only to that person’s own self, but to that person’s colleagues and boss.

But,…

…if there is a process in existence, for modifying the “Naming Conventions”, when a “gap” is discovered --maybe, with the name of the programmer who discovered that “gap” being noted, in the “Naming Conventions” document (as well as a notation made, in the personnel record, etc.)-- the “chafing” and/or “champing at the bit” might be able to be kept at manageable levels.

I don’t know.

BUT, the problem with “meaningful” data-set names is probably similar to the problems that led design-consultants to discourage the use of “composite” record-identifiers, “back in the day”[i.e., each “section” or “segment” of an identifier would have some kind of “meaning” related to the various classifications and groups with which any given record, in a data-set, might be associated, such as…

…“MAKE-YEAR-COUNTRY-MODEL-BODYTYPE-PLANT-DATE-SHIFT-SERIAL#”]:

Such IDs take up a lot of space; and they imply (but don’t guarantee) the existence of “bookkeeping” that makes sure that there’s no “discrepancy” in existence, between any component of the ID, and:

a) the “dictionaries” that each “component” of the ID draws from; and/or

b) the “rules” that would make an ID invalid, if the “DATE” portion of the ID was a date on which that particular PLANT (which is also portion of the ID) was closed or not-yet-in-operation.

Obviously, as far as data-set “Naming Conventions” are concerned, the organization might decide to create a database that contains an indexed/searchable list of all of the “production” data-set names, together with a description of what is supposed to be contained in that data-set. In such a case, the data-set names might be meaningless.

BUT, creating such a database might suggest the existence of an entire application-system that would be the only provider of any kind of access to any of those “production” data-sets, to ANY program that might attempt access.

No typing the data-set name in the JCL, for example; because, the programmer wouldn’t know the data-set name. The programmer would select the data-set, based on its description.

Maybe the description would get inserted into the JCL; and, at run-time, the Interpreter (modified to handle that data-set-name catalog) would go into the data-set name-catalog, and retrieve the data-set’s “real” name/location; and the operating-system would load that data-set.

I don’t know. I have to stop, because I’m going on “flights of fancy”, now; and I want to keep my feet on the ground.

It’s probably enough, to just have the computer “flag” any new data-set names and give them to someone who’d check for standardization; and, if there’s an anomaly, that person asks the programmer what’s what. If there’s a reason for the non-standard name, a new standard is created, in “the Manual”; and the programmer gets to choose between a free bear-claw and coffee, for breakfast, the next day, or a free [individual-sized] bag of Doritos and a Dr. Pepper, for “afternoon snack”.

Jambock · October 16, 2023, 3:46pm

Thanks for replying!
For sure all that it’s part of the equation!

I believe I understand all your considerations, including the technical and non-technical ones. Those with decades of experience should have tasted similar realities and no doubt it’s though. Certainly when joining a group, it’s wise not trying to change anything, no matter how awkward and dumb it might look like, unless explicitly assigned to and, very, very well supported by those who have the authority.

But looking forward, reaching for above and beyond the clouds , the shall be peacefully waiting for those who with a huge effort have managed to not give up despite the “status-quo” insist saying it’s worthless (until something catch their attention out of nowhere).

Topic		Replies	Views
Efficient Data Processing COBOL technical questions	5	378	August 24, 2023
Mainframe and .NET compatablility COBOL technical questions	5	878	November 19, 2020
On creating a VSAM cluster	2	251	October 6, 2023
Newbie question on Indices vs. Subscripts COBOL technical questions	8	1325	November 20, 2020
Experienced COBOL Progrmr w Oracle, DB2, VSAM Calling all COBOL Programmers	0	134	April 13, 2020

On datasets naming-scheme for batch processing

TOPIC: BATCH VS. OLTP

EXAMPLE

TOPIC: NAMING-CONVENTIONS

Related topics