What is Big Data ?

Every one has their own Definition for the term – “Big Data”.

Here’s my definition so πŸ˜›

Definition

Technologies [Environment and Tools ] used to handle huge volume of data.

Characteristics that differentiates Big Data

  • Variety – Data can be structured [DB, Excel] , Semi Structured [XML Files, JSON Files] and Unstructured [Blogs, Videos]
  • Velocity – Speed of Data
  • Volume – Amount of data in Zeta Bytes
  • Veracity – Reliability of Data

6 Key aspects

  1. Integration
  2. Analysis
  3. Visualization
  4. Workload Optimization
  5. Security
  6. Governance

Data Scientist Requirement :
Data Engineering
Scientific Method
Math
Statistics
Advanced Computing
Visualisation
Hacker Mindset
Domain Expertise

What is Big Data ? – Course Link

Answers for the Questions on above Course :

Big Data :

Module 1 – What is Big Data :

Defitions from each persons
Also some basic idea abot 4 V’s

Name one of the drivers of Volume in the Big Data Era?
Scalable Infrastructure

Value from Big Data can be _?
Profits

In the video, 2.5 Quintillion Bytes of data are equivalent to how many blue ray DVDs?
10 Million

========================================================================================================================

Module 2 – Beyond the Hype :

bytes – KB’s – GB – TB – PetaByte – ExaByte – ZetaByte

Types :
Human generated data
Machine generated data
Business generated data

Case Study :
Google logic – Indixing the web for searching and increasing the pge ranks
Hadoop – Yahoo developed – handed over to Apache

Questions –

How many petabytes make up an Exabyte?
1024

What is an example of a source of Semi-Structured Big data?
JSON Files

When is it estimated that the data we create and copy will reach around 35 zettabytes?
2020

==================================================================================================================

Module 3 – Big Data and Data Science

Dealing with big data

6 Key aspects :
Integration
Analysis
Visualisation
Workload Optimization
Security
Governance

Applications :
Hadoop
Oozie – Dashboard
Flume
Hive
Pig
Spark
Sqoop
Zoo keeper

Data Scientist Requirement :
Data Engineering
Scientific Method
Math
Statistics
Advanced Computing
Visualisation
Hacker Mindset
Domain Expertise

Case Study :
About 2 scientists coversation on big data [climatic change prediction, data scientists and MBA courses]

Questions –

What is the process of cleaning and analyzing data to derive insights and value from it?
Data Science

What is the search engine used by Walmart?
Polaris

An example of visualizing Big Data is___________?
Temparature on a map

===============================================================================================================

Module 4 – Big Data Use Cases :

How big data is used
monitoring traffics
auto driving cars
product recommendations in wallmart

Case Study :
Big data and sesnsors – Zigbee

Questions –

What is the term used to describe an holistic approach that takes into account all available and meaningful information about a customer to drive better engagement, revenue and long term loyalty?
Enhanced 360 degree View

What can help organizations to find new associations or uncover patterns and facts to significantly improve intelligence, security and law enforcement?
Analyzing data in-motion and at rest

In Operations Analysis, we focus on what type of data?
Machine Data

======================================================================================================================

Module 5 – Processing of Big Data :

Hadoop

Case Study – Scheduling on Hadoop by Oracle and about Lustre

Questions –

What is a method of storing data to support the analysis of originally disparate sources of data?
Data Lakes

Data Warehouses provide online analytic processing: True

What does β€˜OLAP’ stand for?
Online Analytical Processing

===================================================================================================================

QUESTION 1 (1 point possible)
In Module 1: What is a common use of big data that is used by companies like Netflix, Spotify, Facebook and Amazon?
Recommendation Engines

QUESTION 2 (1 point possible)
In Module 2: Is one byte binary? False

QUESTION 3 (1 point possible)
In Module 2: What has highly contributed to the launch of the Big Data era?
Cloud Computing

QUESTION 4 (1 point possible)
Module 3: A data scientist is a person who is qualified to derive insights from data

by using skills and experience from computer science, business or science, and statistics: True

QUESTION 5 (1 point possible)
Module 3: β€˜HDFS’ stands for ________?
Hadoop Distributed File System

QUESTION 6 (1 point possible)
Module 3: Data privacy is a critical part of the big data era. Businesses and individuals must give great thought to how data is _________________.
Collected, retained, used and disclosed

QUESTION 7 (1 point possible)
Module 5: In the Hadoop framework, a rack is a collection of __?
Nodes

QUESTION 8 (1 point possible)
Module 5: What is a method of storing data to support the analysis of originally disparate sources of data?
Data Lake

QUESTION 9 (1 point possible)
Module 5: The Hadoop framework is mostly written in the Java programming language. True

QUESTION 10 (1 point possible)
Module 5: What is the term referring to a database that must be processed by means other than just the SQL Query Language.
NoSQL

Leave a comment

Design a site like this with WordPress.com
Get started