A proper definition
of “big data” is difficult to achieve because projects, vendors, developers,
and business professionals use it quite differently. With these things in mind,
generally speaking, big data is:
of large datasets
category of computing strategies and technologies that are used to handle
large datasetsTechnology has taken over every field today resulting
in huge data growth. All of this data is valuable. 3 to 4 million data is used every
day. One machine can’t store and process this huge amount of data therefore the
need to understand big data and methods to store this data arises. Big data is a huge amount of data which
can’t be processed using traditional systems of approach (computer system) in a
given time frame.
Now how big does this data need to be? There’s a
common misconception while referring the word big data. There’s not a
threshold of data above which data will be considered as big data. It
is referred to data that is either in gigabytes, terabytes, petabytes, exabytes
or size even larger than this. This definition is wrong. Big data depends
purely on the context it is being used in. Even a small amount of data can be
referred to as big data. For example, you can’t attach a file to an email with
a size of 100 MB.Where “large dataset” means that a dataset too large to
reasonably process on a single computer or store with traditional tooling. This
means that the common scale of big datasets is constantly shifting and may vary
significantly from organization to organization.There are 3 V’s of big data, 4 have been recently
added making them 7 in total.
It refers to the huge amount of data that is created in places ranging from
data created by social networking sites, banks (accounts, credit and debit
It is referred to different types of data being used for as
discussed above (structured, semi structured and unstructured).
While processing, more and more data keeps on coming and it has to be processed
efficiently and within the time frame. For example, every minute new videos are
being uploaded on YouTube.
This is referred to the authenticity of the data. For example
twitter uses hash tags, abbreviations in user’s tweets. The accuracy of all
this content is checked by twitter.
The type of data that is visible.
Referred to the validity of data. For example, in 1998 different kinds of files
were than that are being used now.