Introduction
In this blog post, we will compare and contrast two ways of storing and managing data: file systems and database management systems (DBMS). We will explain what they are, how they work, and what are their advantages and disadvantages.
What is a File System?
A file system is a way of organizing and storing files on a storage device, such as a hard disk, a flash drive, or a CD-ROM. A file system consists of different files that are grouped into directories or folders. Each file has a name, a location, and some attributes, such as size, type, permissions, etc. A file system performs basic operations, such as creating, deleting, renaming, copying, moving, and searching files.
A file system can be used to store any kind of data, such as text documents, images, audio, video, etc. However, a file system does not have any knowledge of the structure or meaning of the data inside the files. For example, a file system does not know that a file contains a table of student records or a list of products. A file system also does not provide any mechanisms for enforcing data integrity, security, consistency, or concurrency.
What is a DBMS?
A DBMS is a software application that manages a collection of related data. A DBMS stores data in a structured and organized way, using tables, records, fields, keys, indexes, etc. A DBMS also provides various functions and tools for manipulating, querying, analyzing, and maintaining the data. For example, a DBMS can perform operations such as inserting, updating, deleting, selecting, sorting, filtering, grouping, aggregating, joining, and calculating data.
A DBMS can be used to store any kind of data that has some logical relationships and dependencies. For example, a DBMS can store data about students, courses, grades, teachers, etc. A DBMS also provides mechanisms for ensuring data integrity, security, consistency, and concurrency. For example, a DBMS can enforce rules such as primary keys, foreign keys, unique constraints, check constraints, etc.
Comparison between File System and DBMS
The following table summarizes some of the main differences between file systems and DBMSs:
Criteria | File System | DBMS |
Structure | Unstructured and flat | Structured and hierarchical |
Data Redundancy | High | Low |
Data Independence | Low | High |
Data Consistency | Low | High |
Data Integrity | Difficult to enforce | Easy to enforce |
Data Security | Low | High |
Data Recovery | No backup or recovery mechanism | Backup and recovery mechanism |
Data Manipulation | No efficient query processing | Efficient query processing |
Data Sharing | Difficult to share data among multiple users or applications | Easy to share data among multiple users or applications |
Data Abstraction | No abstraction of data details | Abstraction of data details |
Complexity | Low | High |
Cost | Low | High |
Example of File System and DBMS
To illustrate the difference between file systems and DBMSs, let us consider an example of storing data about students, subjects, and results.
File System Approach
In the file system approach, we can create three files: student.txt, subject.txt, and result.txt. Each file contains some fields separated by commas. For example,
student.txt:
roll_no,name,course
101,Rajesh,MCA
102,Riya,MBA
103,Amit,B.Tech
subject.txt:
sub_code,name,max_marks
CS101,C Programming ,100
CS102,DBMS ,100
CS103,OOP ,100
result.txt:
roll_no,name,course ,sub_code,name,max_marks ,obtained_marks
101,Rajesh,MCA ,CS101,C Programming ,100 ,85
101,Rajesh,MCA ,CS102,DBMS ,100 ,90
101,Rajesh,MCA ,CS103,OOP ,100 ,80
102,Riya,MBA ,CS101,C Programming ,100 ,75
102,Riya,MBA ,CS102,DBMS ,100 ,95
102,Riya,MBA ,CS103,OOP ,100 ,70
103,Amit,B.Tech ,CS101,C Programming ,100 ,65
103,Amit,B.Tech ,CS102,DBMS ,100 ,60
103,Amit,B.Tech ,CS103,OOP ,100 ,55
In this approach, we can see that there are some problems:
There is a lot of data redundancy, as some fields are repeated in more than one file. For example, the name and course of each student are repeated in the result file. This wastes storage space and increases the risk of data inconsistency.
There is no data independence, as any change in the file structure or format will affect the applications that use the files. For example, if we want to add a new field or change the order of the fields in the student file, we have to modify all the applications that read or write the student file.
There is no data consistency, as there is no way to ensure that the data in different files are synchronized and valid. For example, there is no way to prevent inserting a record in the result file for a student who does not exist in the student file, or for a subject that does not exist in the subject file.
There is no data integrity, as there is no way to enforce rules or constraints on the data values. For example, there is no way to ensure that the obtained marks are less than or equal to the max marks, or that the roll number is unique for each student across all the files.
There is no data security, as there is no way to protect the files from unauthorized access or modification. For example, anyone who has access to the files can read, write, delete, or copy them without any restriction or authentication.
There is no data recovery, as there is no backup or recovery mechanism in case of system failure or data loss. For example, if the system crashes while entering some data in the result file, the content of the file may be corrupted or lost.
There is no efficient data manipulation, as there is no query language or tool for performing complex operations on the data. For example, if we want to find out the average marks of each student or the highest marks in each subject, we have to write a program that reads and processes all the files.
There is no way to implement Transaction Atomicity. It might happen sometime during data modification between multiple files that data gets modified in one file but couldn't modify the corresponding data in the other file (because of any possible failure).
DBMS Approach
In the DBMS approach, we can create three tables: Student, Subject, and Result. Each table has some columns and rows. For example,
Student:
roll_no | name | course |
101 | Rajesh | MCA |
102 | Riya | MBA |
103 | Amit | B.Tech |
Subject:
sub_code | name | max_marks |
CS101 | C Programming | 100 |
CS102 | DBMS | 100 |
CS103 | OOP | 100 |
Result:
roll_no | sub_code | obtained_marks |
101 | CS101 | 85 |
101 | CS102 | 90 |
101 | CS103 | 80 |
102 | CS101 | 75 |
102 | CS102 | 95 |
102 | CS103 | 70 |
103 | CS101 | 65 |
103 | CS102 | 60 |
103 | CS103 | 55 |
In this approach, we can see that there are some advantages:
There is less data redundancy, as some fields are not repeated in more than one table. For example, the name and course of each student are stored only in the Student table, and the name and max marks of each subject are stored only in the Subject table. This saves storage space and reduces the risk of data inconsistency.
There is more data independence, as any change in the table structure or format will not affect the applications that use the tables. For example, if we want to add a new column or change the order of the columns in the Student table, we do not have to modify all the applications that access or modify the Student table.
There is more data consistency, as there is a way to ensure that the data in different tables are synchronized and valid. For example, we can use primary keys and foreign keys to link the tables and prevent inserting records that do not match with other tables. For instance, we can make roll_no as the primary key of Student table and sub_code as the primary key of Subject table. We can also make roll_no and sub_code as the foreign keys of Result table, referencing Student and Subject tables respectively. This way, we can ensure that every record in Result table corresponds to a valid student and a valid subject.
There is more data integrity, as there is a way to enforce rules or constraints on the data values. For example, we can use unique constraints, check constraints, etc. to ensure that the obtained marks are less than or equal to the max marks, or that the roll number is unique for each student.
There is more data security, as there is a way to protect the tables from unauthorized access or modification. For example, we can use user accounts, passwords, roles, permissions, etc. to control who can read, write, update.
Conclusion:
File-processing systems have major disadvantages:
i. Data Redundancy and inconsistency
ii. Difficulty in accessing data
iii. Data isolation
iv. Integrity problems
v. Atomicity problems
vi. Concurrent-access anomalies
vii. Security problems
This is why we have DBMS !!