Skip to main content

What is Big Data?

Big Data is the huge amount of data which can't be stored and processed using the traditional methods within the given time frame.


So, The Question arises that how much big should be the big data.
Generally, people think that the data whose size is more than GB, TB, PB is big data. But, it's not the case.
some data which is small in size can be a big data.

For example
100mb of a document is to be sent by email(we generally use Gmail), but it's not possible because Gmail doesn't support this feature.


 That's why  100 MB of the document can be referred to as a big data for email service.

let's understand bigdata with another Example

1TB of data is given a person, 1 TB contains images which he has to edit and process in a particular amount of Time, for a normal user it will be a Bigdata.


Some analysis of data in the real world
  • Facebook-100TB /day
  • twitter- 4400 twites /day
  • LinkedIn - 10TB/day
  • Google+ - 10TB/day
  • Youtube - 48h of fresh video /minutes

Now you can understand how much the data is present currently. Managing data becoming crucial day by day.
Because of this HADOOP comes in picture.


Below are the fields which come under big data

  • Search Engine Data
The Data we generate while searching, it stored by search engine provider. They analyze the data and get to know more about the user.

list of top 10 Search Engines in the world

  • Stock Exchange Data
Stock market data is stored, all the data changes according to time.
  • Social Media Data
Social media like Facebook, Youtube etc contains a huge amount of data. All the comments, likes, Dislikes, Image, video are stored.
  • Black Box Data
The black box is installed in the airplane, jets etc
All the data generated while traveling is stored in it like the speed, weather, oxygen level etc.
It is used for analysis if something went wrong with Airplane.



Thus Big Data includes huge volume, high velocity, and extensible variety of data. The data in it will be of three types.

  • Structured data − Relational data.
  • Semi Structured data − XML data.
  • Unstructured data − Word, PDF, Text, Media Logs.
  • The V's of Big Data: Veracity, Velocity, Volume, Value, Variety.


Veracity

Veracity is the quality or trustworthiness of the data.  Just how accurate is all this data?  For example, think about all the Twitter posts with hashtags, abbreviations, typos, etc., and the reliability and accuracy of all that content.  Gleaning loads and loads of data is of no use if the quality or trustworthiness is not accurate.  Another good example of this relates to the use of GPS data.  Often the GPS will “drift” off course as you peruse through an urban area.  Satellite signals are lost as they bounce off tall buildings or other structures.  When this happens, location data has to be fused with another data source like road data, or data from an accelerometer to provide accurate data. 
Ignoring Big Data won’t make it go away, and while it may not immediately kill your business it shouldn’t be ignored for very long.  The results of Big Data can generally be directly measured making it easy to determine a return on investment.  Big Data is a tool definitely worth looking into.

Variety

Variety is defined as the different types of data we can now use.  Data today looks very different than data from the past.  We no longer just have structured data (name, phone number, address, financials, etc) that fits nice and neatly into a data table.  Today’s data is unstructured.  In fact, 80% of all the world’s data fits into this category, including photos, video sequences, social media updates, etc.  New and innovative big data technology is now allowing structured and unstructured data to be harvested, stored, and used simultaneously.

Volume

Volume refers to the incredible amounts of data generated each second from social media, cell phones, cars, credit cards, M2M sensors, photographs, video, etc. The vast amounts of data have become so large in fact that we can no longer store and analyze data using traditional database technology.  We now use distributed systems, where parts of the data are stored in different locations and brought together by software.  With just Facebook alone there are 10 billion messages, 4.5 billion times that the “like” button is pressed, and over 350 million new pictures are uploaded every day.  Collecting and analyzing this data is clearly an engineering challenge of immensely vast proportions. 

Velocity

let’s talk about velocity.  Obviously, velocity refers to the speed at which vast amounts of data are being generated, collected and analyzed.  Every day the number of emails, twitter messages, photos, video clips, etc. increases at lightning speeds around the world. Every second of everyday data is increasing.  Not only must it be analyzed, but the speed of transmission and access to the data must also remain instantaneous to allow for real-time access to a website, credit card verification and instant messaging.  Big data technology allows us now to analyze the data while it is being generated, without ever putting it into databases.

Value

When we talk about value, we’re referring to the worth of the data being extracted.  Having endless amounts of data is one thing, but unless it can be turned into value it is useless.  While there is a clear link between data and insights, this does not always mean there is value in Big Data.  The most important part of embarking on a big data initiative is to understand the costs and benefits of collecting and analyzing the data to ensure that ultimately the data that is reaped can be monetized. 




Best books for big data which you can prefer for more knowledge 






Related post
What is blockchain?
What is cloud computing?
MBA?
Best jobs for 2020?



Comments

  1. Explanation is very well with understanding example.

    ReplyDelete
  2. Explanation is....awes awe
    With understanding and currently example
    I like it..🙂

    ReplyDelete

Post a Comment

Popular posts from this blog

What is BlockChain ?

The blockchain is a chain of blocks which contain information. First used in Bitcoin by " Satoshi Nakamoto " in  2009 for Bitcoins(A digital currency) The blockchain is a distributed ledger. Now, what is ledger? According to Wikipedia ledger is A  ledger   is the principal book or computer file for recording and totaling economic transactions measured in terms of a monetary  unit of account  by account type, with  debits and credits  in separate columns and a beginning monetary  balance  and ending monetary balance two account. In simple terms a file which stores information like transactions and account details called Block and the connection between them is Chain. A block contains 3 part  Hash of the previous block Data Hash   Block of Blockchain Hash is a  like a fingerprint, which contains details of data(sender, receiver, content, previous block hash etc) Hash looks like this "a 0680c0...

What is Cloud Computing ?

Cloud Computing  is the delivery of services. It enables the user to Store data, Access service, and Share platforms.  Cloud Computing Cloud Computing services contain Data Storage   Server Database Networking etc Cloud Computing Services are divided into 3 types IaaS – infrastructure as a service It provides you Infrastructure for computing , physically or in the form of virtual machine. Service provider  - Azure , Rackspace , Amazon etc                 (Best Jobs in 2020) Example:-  let’s suppose you have a laptop of 4 Gb RAM and you wanted to install Android Studio, but your system doesn't support it well. Your laptop performance is lagging. Your friend Suraj have one extra laptop which is not in use and it has 8 GB  RAM,  so You borrow his laptop for 30 days, You completed your work and you return the laptop, this is called IaaS. ...