AxleBase

The future by design.


summary

description

applications

home page

documentation

inquiries

policy letter

queued

caution




Advanced Technology

In A Database Manager







Database Nomenclature

( Please scroll down. )

tests

limitations

change log

demonstrators

date protocol

syslink protocol

nomenclature

research



                               

__________________________________________________



A few terms of the database jargon are covered here. Many professionals do not understand them as well as they think, but if they are entirely new to you, then you may not understand the nature or significance of the AxleBase project.


Very Large     ( VLT, VLDB )
Database Classification
BLOB
Unstructured Data
Database
Relational Database
Database Manager     ( DBMS )
Database Manager Classification
On Line Transaction Processing (OLTP)
Data Warehouse     ( DSS )
Database Administrator   ( DBA )
Torrent Technology
SQL
Table
Index
Virtual
Distributed Computing
Cloud Computing
Axsys
Concurrency
DASD




__________________________________________________
Nomenclature
Section
Very Large
Database Classification

Although inexact, the term "very large" is important in the database professions and has a special meaning. When it is applied to a table or to a database, then all domains ( hardware, complexity, budgets, manpower, etc.), suddenly expand explosively beyond the ordinary. Before the outsourcing craze, very large data entities were managed and administered only by the top people in the professions.

Although nebulous, it is not uncommon for professional publications, such as the AxleBase documentation, to use the term's acronym as though it has definitive meaning for a special class of table and database. An understanding of the special nature of the AxleBase project requires an acquaintence with the term.

The concept has been used through the years to mean a larger than ordinary object, but its application specificity has changed as hardware and software changed. That which was a very large table on a mainframe before the PC revolution is now routinely managed in a desktop database manager.

The concept is based upon long and shared experience among those who work with databases and is assumed to exceed the "large" size. Part of its utility has come from its nebulous nature among professionals, but that cannot be tolerated when talking to others, so let's adopt a "ball park" number as a benchmark. This number is generally observed within AxleBase documents and will get little argument from professionals.

Therefore, let us say that if a table has at least fifty billion rows, then it is very large. That is not a fixed value, but is intended to give a non-professional a quick entry into the concept. It will immediately allow you to grasp many facets of database work that are usually known only by database professionals.

Objects outside of tables are usually ignored by the "large" and "very large" concepts. Extraneous objects such as indices, BLOB's, etc. do require storage, and sometimes far more storage than do tables, but their management is trivial and secondary, so they are ignored when assessing the magnitude of a database.

( If you gently insist that the data professionals approximately adhere to your new concept while talking with you, they will seldom object.)

( Now that you have a handy definition, do not allow AxleBase to confuse the issue. The AxleBase project pushes on the very limits of human imagination so that AxleBase is in a class by himself. He has been tested with tables containing trillions of rows and is designed to go far beyond that.)

Two common acronyms :
        A very large table is frequently referred to as a VLT.
        A VLDB is a database containing one or more VLT's.





__________________________________________________
Nomenclature
Section
BLOB

"BLOB" is the abbreviation or acronym for "binary large object". A few examples are photographs, scanned documents, architectural drawings, audio recordings, and geology scans. Most BLOB's contain a lot of bytes and each is stored in its own file. A blob can be anything that can be stored by a computer in a file. It is therefore, not considered data.

Since they are not data, there is no need to store those large BLOB files in tables. Only their locations, called pointers, need to be in a table. When somebody wants the photo of a battleship, he specifies the battleship, the database manager looks it up in the table to find the pointer, and uses the pointer to retrieve the photo from its file outside the database; all far faster than could be done if those BLOB's were in the table.

AxleBase has been running a BLOB database for years with gigabytes of photos and audio recordings in it, but it is hardly worth mentioning because there only three or four thousand rows in it despite the seemingly large size. Thus, huge stores of astronomy photographs can be managed in tiny databases by small database managers.

The term "unstructured data" is marketing deceit to camouflage simple BLOB's. A seismograph chart may be "data" to a geologist and a photograph may be "data" to an astronomer, but they are just BLOB's to data professionals.

Simplified Summation :
        A BLOB is a single large binary file such as a photograph, audio recording, etc.





__________________________________________________
Nomenclature
Section
Database

A database is any organized collection of information.

Note the absence from that definition of the word "computer". The database concept was formalized because computers allow very large databases, but databases have been around since the clay tablets of the ancient Sumarians, and will probably be around in some form after the demise of the computer.

Also, note the absence of software from the definition. A database and the software that manages it are distinctly separate, and they are as different as are an egg and an eggbeater.

A database is simple data that is stored in an awesomely complex manner. A helpful concept is the table so that a database can be thought of as a collection of tables, each of which usually contains data about a certain kind of thing. Tables consist of rows that consist of columns.

But you need not know much about the nature of a computer database because you will never encounter one. Even those who administer and work with them professionally do so entirely through database managers such as AxleBase.





__________________________________________________
Nomenclature
Section
Relational Database

Like much of database work, this is a simple concept that you can quickly grasp. But be cautioned that if you try to consider all of its ramifications, you may become overwhelmed by the comlexities.

There are types of databases such as relational, object oriented, hierarchical, etc.. The types are so different from each other at a fundamental level that each type must have its own type of database manager. The most useful and most widely used type is the relational database. Hence, AxleBase is identified as a relational database manager.

A relational database consists of tables of data. A table has rows and columns that organize the data. The thing that makes it relational is the fact that the data in the various tables are related to each other so that the tables can be conceptually joined.

For example, your personal information is in a row of a government table and your tax records are in rows in another table. Your personal record and your tax records can be easily joined together at will because your government control number is in the rows; i.e., they are related by that number. Notice that the name of the government control number is irrelevant; all that is important is that it makes the tables relational. Since the data is segmented in detail and there are many things relating the government's millions of tables, the relational database is a powerful tool.

The data may or may not actually be stored in tabular form. Usually, it is not, for esoteric technical reasons, but that hardly matters because humans seldom ever see the raw data. The database manager is in charge of storing the data and in charge of maintaining and serving the human's tabular concept. It is perfectly valid for us to believe that our data is in tabular form because the database manager handles all of the technical details and is our interface to the database.

(The relational database was conceived by an IBM mathematician and has a succinct and rigorous definition that would be meaningless here. This one is far more practical for our purposes.)

Simplified Summation :
        A relational database is one in which at least one column of each table contains data like that in columns of other tables so the tables can be conceptually joined by software.





__________________________________________________
Nomenclature
Section
Database Manager

A database manager is a tool, such as AxleBase, that manages a database and the data that it contains. ( "Manager" is a bit of a misnomer because it is only a tool that is used by the human manager, but the term is now in general use, so the person who manages a database manager, regardless of the size, is called the administrator.)

Note the absence from that definition of the words "computer" and "software". The database manager of the ancient Sumarians was a team of priests that would rumage through their thousands of clay tablets looking for a bit of information. Never before in all of Mankind's history have such powerful tools been available for the management of data as we now have on computers, but this, too, may be fleeting while the definition remains. But to bring our thoughts in line with those of the masses, let us state that a database manager is the computer software system that manages a database.

The database manager seldom manages a single database. An AxleBase instance can easily manage many large databases. This is one of the many things that separate the database managers from the internet search engines and data warehouses.

The actual tasks performed by database managers such as AxleBase are numerous beyond description and complex beyond the understanding even of their creators. Generally speaking, a database manager is told how we want the data managed and humans seldom need to look at it again. That includes massive operations such as copying millions of records to safety every night, and detailed operations such as finding a person's telephone when you dial the number. It includes maintaining indices so your payment record can be found in a hundredth of a second instead of requiring two centuries. And on and on... The database manager is one of the most complex systems built by Mankind.

Simplified Summation :
        A database manager is a system that manages databases.

Acronyms :
        DBMS is a database management system.
        RDBMS is a relational database management system.
        AxleBase is a DBMS and an RDBMS.





__________________________________________________
Nomenclature
Section
Database Manager Classification

The classification of database managers has little to do with their types. Whereas the types correspond to their methods of managing data, the classes are approximately determined by the amount of data that they are able to control. (Data warehouses, internet search engines, and tubs full of beans are simpler than database managers, so these classifications do not apply to them.)

Data professionals usually divide database managers into two classes :
        1. Desktop systems,
which are designed for small local duty by an individual or small department, and
        2. enterprise systems,
which are designed to handle the heavier storage, processing, dependability, and security of an organization. There is some overlap, so the distinctions are nebulous, but the two classes are very real and the distinctions can be critical.

The creation of AxleBase pushed the envelope out into an entirely new, and previously undreamed-of realm. His creation gave us a third class of database manager. The Axlebase class sets far above the enterprise category wielding quantities that are usually found only in astrophysics because they are so vast.

The big-name brands have recently shown interest in AxleBase-class systems. Their billions of dollars and visibility have thereby validated the class and verified the need for it. Although they are not yet delivering the complexities of a full featured database manager within the AxleBase class, they seem to be making progress.

There may be no remaining undeveloped level in this classification because of the way that the technology works internally. The published AxleBase limit is artificial. It was restrained years ago because there was fear that the vastness of his limits might seem humourous and therefore not be taken seriously. Since the AxleBase class has been recognized, his restraint can now be removed if needed. His internal technology and engineering can handle nearly any conceivable table size. (Every star in the visible universe? Sure.)

( Please. To protect us from self-pride, let us continually remind each other of him who does these things through men for his own purposes. )

Simplified Class Summation :

Desktop class     Individual use.
Enterprise class     Organizational use.
AxleBase class     Anything bigger.





__________________________________________________
Nomenclature
Section
On-Line Transaction Processing
Data Warehouse

Databases are usually designed either for massive archival storage or for frequent processing of the data. The archival storage has little activity, whereas the active storage may undergo constant data inserts, deletes, and updates. The archival database is frequently referred to as a data warehouse or a Decision Support System, and the active database is known as an On-Line Transaction Processing database. (Decision Support Systems are sometimes referred to as On-Line Analytical Processing.)

The distinction is non-trivial. Each type of storage requires very different database designs and administration. More importantly for us, is the fact that each type of database requires a different set of functions in the database manager. The importance of that can be seen because the organization usually must select a database manager before any other database decisions are made.

Each domain is so different from the other that database managers are usually designed for one or the other. Some big name brands straddle the fence by throwing vast fortunes into hardware to compensate for their database manager's inabilities in one domain or the other. Updating a row in a twenty-trillion-row table is extremely complex.

( Per dollar, none can excel the advanced architecture of AxleBase. He supports OLTP in databases larger than any others of any type can even create.)

Acronyms :
        OLTP is an On-Line Transaction Processing system.
        DSS is Decision Support System.

Simplified Summation :
        A data warehouse is a large and nearly static database that is used for studies and decisions.
        An OLTP database is an actively updated and queried data store.





__________________________________________________
Nomenclature
Section
Database Administrator

The database administrator is the person who is responsible for the database, the software that manages it, and the hardware that houses it. He is responsible for the integrity of the data and for its short term and long term safety. He is responsible for the data always being available when needed. He may also provide training and guidance in accessing data in the database.

Although that very brief definition is accurate, it does not convey the esoteric complexity of some of the professional's duties. The administrator of a large mainframe database may deal in technical abstractions that are so incomprehensible that even if you think you understand a word that he used, you do not. Their work is so rarified and esoteric that professional certification would be a joke. They are the unsung heros of our civilization who cannot receive recognition because society cannot understand what they are doing at their terminals at three in the morning, but be certain of this, this civilization would stop without them.

Acronym : DBA





__________________________________________________
Nomenclature
Section
Torrent Technology

Torrent Technology is the ability to digest a steady stream of vast quantities of data. This is becoming more of a problem as new ways of gathering data are found and as new uses are found for data. Data streams have surpassed anything that existed in the past.

The problem currently is not that the data pipes are too small, but that they are able to carry such vast torrents of data. The problem arises at the destination where ordinary software systems and hardware are like a man trying to drink from a fire hose. Some data streams can easily overwhelm all ordinary software and hardware systems.

New technologies are being developed to handle those data streams. For example, AxleBase can be configured to become the facilitating component of large arrays of data gathering applications.





__________________________________________________
Nomenclature
Section
SQL

SQL is a language that is used to access and manipulate data in relational databases. SQL stands for structured query language and most people pronounce it as "seequel". It is so common and easy to use that many people forget that it is a programming language, and even business executives routinely use it. It is so common that the acronym has become the name.

SQL is a powerful tool for its purpose. Although it has practical and theoretical shortcomings as a language, it would be difficult to imagine a substitute. Because the database concept and the relational construct are woven into the foundation of the language, and because database managers such as AxleBase understand SQL, the language gives the ability to manipulate mountains of data with succinct statements. Those who use it routinely forget that they easily perform complex conceptual operations on their database that cannot be done any other way.

Nearly all database managers around the world understand SQL. Therefore, someone who knows how to use SQL can access nearly any relational database anywhere in the world. It is so powerful that most database developers embed SQL within the code of their primary programming language to perform data operations.

Simplified Summation :
        SQL is the most popular language in the world for reading and updating databases.





__________________________________________________
Nomenclature
Section
Table

A table is a concept and it is the major entity within a relational database. Hence, database professionals frequently refer to them as entities.

A table is a collection of a specified type of data, and the type in each table is determined by the people who design the database. An airline company database might contain such tables as reservation, flight, airplane, and pilot.

Each table has rows. The pilot table might have a row for each pilot. The reservation table might have a row for each reservation. Each row has columns. The pilot row might have a first name column, a last name column, and a hire-date column. The actual row and column types are determined by the people who design the database.

Notice the interesting separation of the table from the database manager. Local administrators design their databases as needed and the database manager will manage it. That flexibility is another characteristic of the database manager.





__________________________________________________
Nomenclature
Section
Index

Searching for a piece of information in a table can take more time than we want. If a computer takes a thousandth of a second to find a row, and the table contains a hundred million rows, then the search will take more than a day to find your friend's phone number. You might find a problem with that.

The solution to the problem is the index. To speed the search, the professional database administrator issues a command to the database manager to create an index for that table. The index points to the information in the table so that a request for your friend's phone number will go directly (almost) to his name in the index to find the pointer to his row in the table. The pointer will be read, the system will follow it directly to his row in the table, and there read his phone number.

Actually, that is a simplistic explanation; the index mechanism is far more complicated than that, but it is far far faster than reading the table. Because AxleBase is designed for extraordinarilly large tables, he provides an obvious example of the difference in speed. One of his test tables is so large that a thousand years would be needed to retrieve a piece of information, so it is indexed, which allows retrieval in minutes.

Another interesting thing about indices is that a table usually has more than one. In our phone number example, there might be an index for the first names, one for the last names, and others. Usually the indices for a table occupy more storage than does the table.

Think of a book's index because a database index is conceptually similar. Paging through the book to find something is simple, but very slow. Using the index is more complicated, but much faster.

If you have read this far, you may be wondering about the actual structure of the indexer in a database manager; how it does its magic. Each database manager has its own proprietary index mechanism. An entire book might be needed to describe the AxleBase mechanism because it may be the most complex sub-system in that system, and it is being extended even as this is being written. The indexer that converts massive and complex logic paths into speed is part of the hidden magic of computer science that runs your world.

An interesting aspect of the AxleBase indices is that they are designed for very large table searches, meaning that they employ mechanisms that are not needed by ordinary database managers. They search very large tables extremely fast, and cannot search ordinary tables as fast as can ordinary database managers.

Simplified Summation :
        An index is specialized software coupled with a way of storing data that speeds access to the data.

Examples :
        The test page on this web site has some examples of index benefits. Look at its Indexed Retrieval Speed section. For an extreme example, look at its Concatenated Virtual Table section.





__________________________________________________
Nomenclature
Section
Virtual

The word "virtual" in general English usage is the honest man's way of telling a lie. Subtle issues accompany it, but that is its primary usage. If somebody says that something is virtually true, then it is certainly not true.

The same usage is found in computer systems. If X is a virtual object, then you know that X is certainly not one of those objects. It usually bears a resemblance to its namesake with common logical roots and common functional usage.

The best teacher in this case is an example. The database administrator can tell AxleBase to create a virtual table in a database. If you have access to the database, you can query the virtual table and update it as you would any real data table. The data is real, but it is not actually in the virtual table and you may never know that the table is virtual. The data may not even be in that database or on the same continent. That table is a local manifestation of real data table(s); it is a virtual object.

There are many reasons for virtual objects. In the case of AxleBase, the administrator may want to express many remote identical tables as a single table in the local database. He can do that through a virtual table. (The virtual tables of AxleBase are totally unlike those of the big-name brands.)

Simplified Summation :
        A virtual object does not exist, but is the manifested behavior of something that makes it appear to be one of the imitated objects.





__________________________________________________
Nomenclature
Section
Distributed Computing

A computer is identifiable as a geographically localized entity. Usually, it can be identified as a single cabinet or box such as the ubiquitous personal computer. And less frequently, it may be identifiable as a group of connected boxes, such as a mainframe. That geographical localization is not accidental because it provides important benefits such as increased operation speed, facility management, and system administration.

A distributed system cannot be identified that easily. In fact, the physical identification of a distributed system is nearly impossible.

The most interesting thing about a distributed system, and the thing that makes it the hardest to recognize, is its existence outside of any physical computer. The distribution of a system can be to any degree; even within a single room. But the characteristic that is shared by all true distributed systems is that they can be distributed to any degree without compromising system integrity. In other words, if a system is truly a distributed system, then it can be located across continents without disrupting it.

Distributed computing is done by a distributed software system. A distributed system is a single entity that uses multiple computers just as a corporation may work in many cities. The distributed system is not a loose collection of cooperating systems, but is a single, identifiable, functioning entity.

The fact that a computer has multiple processors may make it faster, but does not make it distributed. The fact that a system runs on a computer with multiple processors does not make it distributed. Distributed computing transcends computers and is, by definition, done by a system that is distributed across multiple computers.

Distributed systems could be developed only after relatively cheap computers came into existence. When only mainframes existed, a bigger problem simply and obviously meant a bigger single mainframe computer. The relatively cheap computers freed software developers from mainframe control to allow them to dream of new ways to solve problems.

Our favorite example, AxleBase, is a true distributable system. His administrator gives him a problem through his control node and all of his nodes work on the problem in concert. It is even possible for him to have multiple or redundent control nodes. Although this is not required by the definition of a distributed system, his nodes are permitted to lose communication intermittently without failure for resilience. This also is not required by the definition, but his distributed topology does not requre a fixed number of nodes. His administrator can distribute him in the form of any number of nodes as needed. Perhaps the biggest point made in this paragraph is the flexibility given by the distributed model.

A characteristic of a distributed system is that it requires a communication system for its existence. At this point in history, the largest and most pervasive communication system is the internet, but many large organizations have their own communication systems that sometimes cover entire continents. Even small companies today usually have their own local area networks. Any of those can be used by a distributed system.

Notice that a truly distributed system does not require proprietary connections between its components. This is a fine point, but it prevents marketing deceit from confusing us. Since he was engineered for true distributed operation, AxleBase can run on all popular networks; even the internet.

You may have noticed another point by now; that a distributed system is independent of specific hardware. It needs to run on hardware, but that hardware can be any computers, any telephone lines, any routers, etc. etc. For example, AxleBase can run on any Microsoft Windows computers (back to Windows 95), and is being expanded to also run on Linux and Unix to free it from that name-brand. His administrator can even set him to automatically move his nodes when computers fail.

The distributed model may be chosen for a system for power, resilience, flexibility and other reasons. It was chosen for the AxleBase system mainly to increase his power, but it also gives fault-resilience to him and flexibility to his administrators. AxleBase executes some actions faster in his distributed form than he can in his localized form. By keeping him on the personal computer platform, the increased power was achieved at very low cost, but the distribution increased the reliability on that cheap platform.

Simplified Summation :
        A distributed system is one which works on a problem while being geographically distributed across multiple computers and multiple operating systems.





__________________________________________________
Nomenclature
Section
Cloud Computing

No amount of steam valve improvement can make the steam locomotive a new technology. Cloud computing may actually predate the internet.

When cloud computing was first used, it was called time sharing because the users were sharing time on a remote mainframe. The user had a modem and a simple terminal, such as a teletype machine, that sent commands to a distant mainframe computer that processed the information and returned a response. That is the essence of what is today called cloud computing.

Today's equipment is far faster and cheaper, but the cloud technology remains the same. Cloud computing is a remote data processor to which many terminals are connected. Even the old time sharing concepts are resurfacing. Part of the power of cloud computing derives from its simple client-server nature.

The first terminals were dumb terminals since they had no computing power. Later terminals were smart because they could do simple tasks such as updating the local display. Today's powerful PC is wasted as a terminal, so we may soon see the loss of personal power into the cloud. Cloud computing is a technology throughback to the mainframe and its new rise may cause our technologies to decline just as feudalism fed the dark ages.

Cloud computing and distributed computing are entirely different technologies. AxleBase uses the more powerful distributed computing, which gives his operators ownership of their data and total control of his operations, and which increases his power.

Several factors prompt the resurgence of cloud computing :
        Young people have not yet experienced the mind numbing control of the mainframe environment choking creativity.
        Many are attracted simply to the size of "big iron" projects.
        Others are attracted by the opportunity to hide their personal mediocraty within that project bulk.
        The powerful I.B.M's of the world are pushing us back into their control inside the "big iron" mainframe data center.

Simplified Summation :
        Cloud computing consists of many simple terminals that are connected to a remote processor.

( Merely a personal note :
        An awesome event was sitting at a kitchen table with an acoustic modem and a portable teletype machine in the sixties in St. Louis and working on mainframes rumored to be in California. The technology and sociology of that event was a science fiction adventure and a promise for that young man, so it is saddening to see the world giving up and turning back to the stone age.)





__________________________________________________
Nomenclature
Section
Axsys

An Axsys is a distributed database manager. The distributed object is an axsys. An axsys is AxleBase in his distributed form.

( As with any computer object, the axsys creation event is referred to as an instantiation; e.g., "The DBA instantiated an axsys.")





__________________________________________________
Nomenclature
Section
Concurrency

The term "concurrency" as it is used in database operations refers to the simultaneous use of a database by multiple users and/or systems; i.e., concurrent usage. The concurrency problem is complex and heavy for database managers because they must service many clients simultaneously with high speed operations of the infrastructure.





__________________________________________________
Nomenclature
Section
DASD

DASD is an old mainframe acronym that stands for Direct Access Storage Device. It usually refers to disk drives, but can be any storage device. It is a convenient term for systems such as AxleBase that can use any kind of storage that the infrastructure supports.

Only to facilitate communication, it is usually pronounced as dazdee.

Advanced technology has found a new need for the old DASD term. AxleBase has even been tested with CD's, floppy drives, and USB devices.





                                             





Copyright 2003 - 2012 John Ragan

Web site maintained with Notepad and command line FTP.