Every one has their own Definition for the term – “Big Data”.
Here’s my definition so π
Definition
Technologies [Environment and Tools ] used to handle huge volume of data.
Characteristics that differentiates Big Data
- Variety – Data can be structured [DB, Excel] , Semi Structured [XML Files, JSON Files] and Unstructured [Blogs, Videos]
- Velocity – Speed of Data
- Volume – Amount of data in Zeta Bytes
- Veracity – Reliability of Data

6 Key aspects
- Integration
- Analysis
- Visualization
- Workload Optimization
- Security
- Governance
Data Scientist Requirement :
Data Engineering
Scientific Method
Math
Statistics
Advanced Computing
Visualisation
Hacker Mindset
Domain Expertise
What is Big Data ? – Course Link
Answers for the Questions on above Course :
Big Data :
Module 1 – What is Big Data :
Defitions from each persons
Also some basic idea abot 4 V’s
Name one of the drivers of Volume in the Big Data Era?
Scalable Infrastructure
Value from Big Data can be _?
Profits
In the video, 2.5 Quintillion Bytes of data are equivalent to how many blue ray DVDs?
10 Million
========================================================================================================================
Module 2 – Beyond the Hype :
bytes – KB’s – GB – TB – PetaByte – ExaByte – ZetaByte
Types :
Human generated data
Machine generated data
Business generated data
Case Study :
Google logic – Indixing the web for searching and increasing the pge ranks
Hadoop – Yahoo developed – handed over to Apache
Questions –
How many petabytes make up an Exabyte?
1024
What is an example of a source of Semi-Structured Big data?
JSON Files
When is it estimated that the data we create and copy will reach around 35 zettabytes?
2020
==================================================================================================================
Module 3 – Big Data and Data Science
Dealing with big data
6 Key aspects :
Integration
Analysis
Visualisation
Workload Optimization
Security
Governance
Applications :
Hadoop
Oozie – Dashboard
Flume
Hive
Pig
Spark
Sqoop
Zoo keeper
Data Scientist Requirement :
Data Engineering
Scientific Method
Math
Statistics
Advanced Computing
Visualisation
Hacker Mindset
Domain Expertise
Case Study :
About 2 scientists coversation on big data [climatic change prediction, data scientists and MBA courses]
Questions –
What is the process of cleaning and analyzing data to derive insights and value from it?
Data Science
What is the search engine used by Walmart?
Polaris
An example of visualizing Big Data is___________?
Temparature on a map
===============================================================================================================
Module 4 – Big Data Use Cases :
How big data is used
monitoring traffics
auto driving cars
product recommendations in wallmart
Case Study :
Big data and sesnsors – Zigbee
Questions –
What is the term used to describe an holistic approach that takes into account all available and meaningful information about a customer to drive better engagement, revenue and long term loyalty?
Enhanced 360 degree View
What can help organizations to find new associations or uncover patterns and facts to significantly improve intelligence, security and law enforcement?
Analyzing data in-motion and at rest
In Operations Analysis, we focus on what type of data?
Machine Data
======================================================================================================================
Module 5 – Processing of Big Data :
Hadoop
Case Study – Scheduling on Hadoop by Oracle and about Lustre
Questions –
What is a method of storing data to support the analysis of originally disparate sources of data?
Data Lakes
Data Warehouses provide online analytic processing: True
What does βOLAPβ stand for?
Online Analytical Processing
===================================================================================================================
QUESTION 1 (1 point possible)
In Module 1: What is a common use of big data that is used by companies like Netflix, Spotify, Facebook and Amazon?
Recommendation Engines
QUESTION 2 (1 point possible)
In Module 2: Is one byte binary? False
QUESTION 3 (1 point possible)
In Module 2: What has highly contributed to the launch of the Big Data era?
Cloud Computing
QUESTION 4 (1 point possible)
Module 3: A data scientist is a person who is qualified to derive insights from data
by using skills and experience from computer science, business or science, and statistics: True
QUESTION 5 (1 point possible)
Module 3: βHDFSβ stands for ________?
Hadoop Distributed File System
QUESTION 6 (1 point possible)
Module 3: Data privacy is a critical part of the big data era. Businesses and individuals must give great thought to how data is _________________.
Collected, retained, used and disclosed
QUESTION 7 (1 point possible)
Module 5: In the Hadoop framework, a rack is a collection of __?
Nodes
QUESTION 8 (1 point possible)
Module 5: What is a method of storing data to support the analysis of originally disparate sources of data?
Data Lake
QUESTION 9 (1 point possible)
Module 5: The Hadoop framework is mostly written in the Java programming language. True
QUESTION 10 (1 point possible)
Module 5: What is the term referring to a database that must be processed by means other than just the SQL Query Language.
NoSQL