Integrity Verification in Cloud Using Blockchain

What is Cloud?

Swagatika Panda
5 min readMay 15, 2021

Cloud service provides users a remote storage space to outsource their personal as well as enterprise data. It provides a simple interface through which the users can deploy or access their stored resources from any part of the globe. The world has witnessed a rapid rise in server virtualization over the last few years. The highly efficient and powerful cloud service models make companies choose cloud to store their proprietary data instead of the traditional physical storage system.

Security In Cloud

Everyday thousands of business enterprises making a transition to cloud platform, with this rapid transition the question of security is inevitable. Security is the major aspect of cloud storage. Though cloud service providers relentlessly assure data confidentiality and integrity, there is a constant risk of malicious attack or security threat in cloud.

Importance of data integrity in cloud?

Data Integrity is the overall accuracy, completeness and consistence of data. Data integrity ensures that the must not be altered or deleted by any unauthorized person. Every company has a set of polies regarding the access and use of their proprietary data and data integrity in cloud ensures that the polies are not violated at any cost.

The factors that can challenge the data integrity in cloud are:

Human error: When individuals enter information incorrectly, duplicate or delete data, don’t follow the appropriate protocol, or make mistakes during the implementation of procedures meant to safeguard information, data integrity is put in jeopardy.

Transfer errors: When data can’t successfully transfer from one location in a database to another, a transfer error has occurred. Transfer errors happen when a piece of data is present in the destination table, but not in the source table in a relational database.

Bugs and viruses: Spyware, malware, and viruses are pieces of software that can invade a computer and alter, delete, or steal data.

Compromised hardware: Sudden computer or server crashes, and problems with how a computer or other device functions, are examples of significant failures and may be indications that your hardware is compromised. Compromised hardware may render data incorrectly or incompletely, limit or eliminate access to data, or make information hard to use.

How many types of data integrity are there?

1. Physical Integrity

Physical integrity is the protection of data’s wholeness and accuracy as its stored and retrieved. That means the retrieved data must be same with the data while storing. Maintaining data integrity refers to the protection of data from any natural disaster, power cuts or threats.

2. Logical Integrity

Logical Integrity ensures that the remains unchanged and original as it goes through various database operations.

How to ensure Data Integrity in Cloud?

First of all, it is necessary to understand that data integrity does not represent data security. Data integrity is concerned with keeping the data intact and accurate.

To ensure the data integrity, one option could be to store data in multiple clouds or cloud databases. The data to be protected from internal or external unauthorized access are divided into chunks and Shamir’s secret algorithm is used to generate a polynomial function against each chunk.

Ø Shamir’s Secret Sharing (SSS): Shamir’s Secret Sharing (SSS) is used to secure a secret in a distributed way, most often to secure other encryption keys. The secret is split into multiple parts, called shares. These shares are used to reconstruct the original secret.

Generating Hashes

Hashes can be used to ensure if the retrieved data is same as while data was stored. Specific mathematical algorithms are used to generate hash values that are used to compare/verify the uniformity of data. Some techniques like SHA or MD5 are highly efficient techniques to ensure data integrity.

Use of Trusted Third Parties (TTT)

Trusted Third Parties (TTP) like are the supporting vendors that take care of our data transmissions.

The Third Party reviews all critical transaction communications between the parties, based on the ease of creating fraudulent digital content. In TTP models, the relying parties use this trust to secure their own interactions.

Use of Blockchain technology to Ensure Data Integrity

Blockchain as an emerging distributed and decentralized technology, can be used to ensure data integrity in cloud domain. The distributed Ledger Technology of Blockchain uses blocks to store data which makes the data modification impossible.

Blockchain ledgers are immutable meaning that if data addition or transaction has been made, it cannot be edited or deleted. It is there and it will be there. In addition, blockchains are not only a data structure but a timekeeping mechanism for the data structure so proof of the history of data is easily reportable and updated to the second. Organizations facing an audit, regulatory compliance requirements, or legal challenges can use blockchain technology to improve data integrity and save millions.

MERKLE TREE

Merkle tree is one important component of blockchain which is responsible for improved data integrity. A Merkle tree which is a data structure, uses cryptographic hash function that generates an overall digital fingerprint of the entire set of transactions.

The mechanism of Merkle tree works in an iterative way. Every block store hashes of the block as well as hash of the previous block. In blockchain technology, hashes of child nodes are combined into the parent node’s header and this technique continues iteratively until a final, or root, node is reached. This final node acts like a fingerprint for the entire tree containing all the information about the entire set of transaction.

The most common and simple form of Merkle tree is the binary Merkle tree.

source: wikipedia

There there are 4 transactions L1, L2, L3 and L4 in the block. All the transactions have been hashed in a unique fashion. The two parent nodes are further combined into one node called the root node. This provides a time-stamp and a nonce which is used to generate the block header with the help of the previous block header hash.

Merkle has been proven to be very efficient in ensuring data integrity and can be used in cloud computing to maintain accuracy of stored data.

--

--