Technospire
Posts
The Greatest Revolution in Databases

The Greatest Revolution in Databases

Harsha Matta
July 01, 2024

The Greatest Revolution in Databases

1974 - San Jose, CA

In the heart of Silicon Valley, a group of computer scientists, working for IBM at their research center in San Jose embarked on a groundbreaking project dubbed System R. System R was all about revolutionizing the way the digital era saw data, and focused on a brand new concept: relational databases.

Data in the 70’s was primarily hierarchical. Each dataset would have the Root or the parent node. This would function as the overarching theme of the database. Under each parent, there would be a child node, and each child node could have their children.

This database is easy for humans to understand. it's simply an overarching theme as the title or the parent node, and each child node is an attribute that goes into more depth. However, trying to task a computer to analyze this became increasingly difficult. To query (request) for a specific data point, a computer would have to search through the entire database, going down every single generation to try and find the attribute we were searching for. It wasn’t a sustainable solution.

In 1970, Edgar F Codd proposed a brand new method to analyze data in his paper “A Relational Model of Data for Large Shared Data Banks.”. This new method was called the relational database. The relational database uses a table instead. Each row specifies an item, or an object. Each column specifies its attributes.

Relational databases are so revolutionary because they are designed for computer analysis. Now when you want to query for all products under an attribute, you could query from a column rather than having to search through all descendants and children in a relational database. Manipulation now can be done by simply changing the title of a category. A key new development is data independence. In a hierarchical database, deleting a parent node would cause errors for the children nodes. This means the data isn't independent, it's reliant on the parent node existing. This is no longer the case in relational database models, as they are simply objects with attributes. They do not need other data points to exist, therefore having data independence.

The IBM San Jose Research Laboratory

4 years after this paper was written, IBM decided to look into the feasibility of the project. They called it System R.

Donald Chamberlin and Raymond Boyce were key in System R . The year earlier Chamberlin and Boyce attempted to create a language for relational databases, called SQUARE (Specifying Queries in A Relational Environment), however, this language did not succeed as it was difficult to use the notation, which involved subscript and superscripts.

After joining IBM they invented a new language, SQL, which is pronounced as sequel, because it was the spiritual sequel to SQUARE. Eventually, SQL was assigned as the acronym for Structured Query Language.

Relational Database structures had become accepted by the industry and SQL would go on to be the standard language for databases. 2024, marks the 50-year anniversary of the creation of SQL, and even to this day is the most prominent language in its industry

How does SQL Work?

SQL is known for its self-documenting syntax, making it highly legible and similar to English.

For example, here is how you create a relational database in SQL. You declare the name of the table after the “CREATE TABLE” command

As your parameters in the condition, you list the columns in your Database, and their datatype. For example in this database, we are creating a table with 4 columns, ID, which is an integer. This column was assigned as the primary key, which means that this attribute must be unique for each object/row in the database. It serves as a unique identifier for each object. The next columns (lines 3 and 4) are assigned as Name and position, but the datatype is called VARCHAR. VARCHAR refers to a variable character field, meaning that the amount of character input in this field is not fixed, it is only as long as it needs to be. However, when we say VARCHAR(50), we specify the maximum length that can be used for the attribute. The final column, declared in line 5, is assigned at the salary, as a decimal with 10 digits before the decimal point, and 2 digits after the decimal point.

This table we have created is for storing data about employees, and we declare the name of each column and their datatype. So in this table, we have four columns: id, which has an integer datatype and serves as the primary key, ensuring each row has a unique identifier; name, which uses the VARCHAR datatype meaning it can store strings, allowing up to 100 characters; position, another VARCHAR field allowing up to 50 characters; and salary, which is a decimal with up to 10 digits before and 2 digits after the decimal point.

From this first example, it's very clear how SQL is very easy to understand for a person who has never used SQL.

Inserting DATA into your database is straightforward using the INSERT INTO command. For example, in the program above, in lines 2, 3, and 4, simply by listing the attributes for each object, you can add a unique object to your database.

However, the true efficacy of SQL comes through when you begin to manipulate data.

Here are examples of commands that are the most often used in SQL. The SELECT command is for querying the entire dataset. You can specify certain attributes to have a specific value using the WHERE command. For example, in line 1 we are querying the database to return all the names, from the database of employees, where the salaries are greater than 50000. This would return Michael Johnson and Jane Smith. The UPDATE command allows us to, shockingly, update any object/row in the database. The DELETE command simply deletes any object you want, and once again you specify the target objects.

Here are examples of commonly used SQL commands. The SELECT command queries the entire dataset. The WHERE clause specifies conditions, this is similar to a conditional phrase in other programming languages. For example, to return names from the employee’s table where salaries are greater than 50,000, use SELECT name FROM employees WHERE salary > 50000; This query would return 'Michael Johnson' and 'Jane Smith'. The UPDATE command modifies existing rows, such as UPDATE employees SET salary = 55000 WHERE name = 'John Doe';, this would increase John Doe’s salary to 55000. The DELETE command removes rows, as in DELETE FROM employees WHERE id = 3;, as Micheal Johnson’s id was 3, this would remove his row from the database.

In conclusion, SQL, 50 years after its inception, has still proven itself to be resilient and is the standard language (even certified by the International Organisation of Standards) for database querying.