Introduction to Data Analytics

Classification of Data Analytics [Difficulty Level –> Easy to Difficult]:

  1. Descriptive Analysis – History Data Analysis
  2. Diagnostic Analysis – Why it happened
  3. Predictive Analysis – What will Happen
  4. Prescriptive Analysis – Guiding best practice to follow

1.Descriptive Analysis – [Charts]

  • Data about past events
  • Provided in detail as summary
    • Data Queries
    • Reports
    • Descriptive Statistics
    • Data Visualization
    • Data Dashboard

2.Diagnostic Analytics

  • Data discovery
  • Data Mining
  • Correlation

3.Predictive Analysis

  • Historic Data –> Predictive Algorithm –> Model
  • New Data –> Model –> Predictions
    • Linear Regression
    • Time Series Analysis and Forecasting
    • Data Mining

4.Prescriptive Analysis

  • Optimization Model
  • Simulation Model
  • Decision Analysis

Types of Data

  • Nominal – Gender, Marital Status – No –> Ranking & Arithmetic Calcuations –> Nonparametric test
  • Ordinal – Ratings, Designation, Student Grades – Can be ranked but no arithmetic operations –> Nonparametric test
  • Interval Scale – Year, Temparature – Addition and Subtraction can be done – Multipication/Div makes no sense –> Parametric test
  • Ratio Scale – Weight, Age, Salary – All arithmetic operations –> Parametric test

What is Big Data ?

Every one has their own Definition for the term – “Big Data”.

Here’s my definition so 😛

Definition

Technologies [Environment and Tools ] used to handle huge volume of data.

Characteristics that differentiates Big Data

  • Variety – Data can be structured [DB, Excel] , Semi Structured [XML Files, JSON Files] and Unstructured [Blogs, Videos]
  • Velocity – Speed of Data
  • Volume – Amount of data in Zeta Bytes
  • Veracity – Reliability of Data

6 Key aspects

  1. Integration
  2. Analysis
  3. Visualization
  4. Workload Optimization
  5. Security
  6. Governance

Data Scientist Requirement :
Data Engineering
Scientific Method
Math
Statistics
Advanced Computing
Visualisation
Hacker Mindset
Domain Expertise

What is Big Data ? – Course Link

Answers for the Questions on above Course :

Big Data :

Module 1 – What is Big Data :

Defitions from each persons
Also some basic idea abot 4 V’s

Name one of the drivers of Volume in the Big Data Era?
Scalable Infrastructure

Value from Big Data can be _?
Profits

In the video, 2.5 Quintillion Bytes of data are equivalent to how many blue ray DVDs?
10 Million

========================================================================================================================

Module 2 – Beyond the Hype :

bytes – KB’s – GB – TB – PetaByte – ExaByte – ZetaByte

Types :
Human generated data
Machine generated data
Business generated data

Case Study :
Google logic – Indixing the web for searching and increasing the pge ranks
Hadoop – Yahoo developed – handed over to Apache

Questions –

How many petabytes make up an Exabyte?
1024

What is an example of a source of Semi-Structured Big data?
JSON Files

When is it estimated that the data we create and copy will reach around 35 zettabytes?
2020

==================================================================================================================

Module 3 – Big Data and Data Science

Dealing with big data

6 Key aspects :
Integration
Analysis
Visualisation
Workload Optimization
Security
Governance

Applications :
Hadoop
Oozie – Dashboard
Flume
Hive
Pig
Spark
Sqoop
Zoo keeper

Data Scientist Requirement :
Data Engineering
Scientific Method
Math
Statistics
Advanced Computing
Visualisation
Hacker Mindset
Domain Expertise

Case Study :
About 2 scientists coversation on big data [climatic change prediction, data scientists and MBA courses]

Questions –

What is the process of cleaning and analyzing data to derive insights and value from it?
Data Science

What is the search engine used by Walmart?
Polaris

An example of visualizing Big Data is___________?
Temparature on a map

===============================================================================================================

Module 4 – Big Data Use Cases :

How big data is used
monitoring traffics
auto driving cars
product recommendations in wallmart

Case Study :
Big data and sesnsors – Zigbee

Questions –

What is the term used to describe an holistic approach that takes into account all available and meaningful information about a customer to drive better engagement, revenue and long term loyalty?
Enhanced 360 degree View

What can help organizations to find new associations or uncover patterns and facts to significantly improve intelligence, security and law enforcement?
Analyzing data in-motion and at rest

In Operations Analysis, we focus on what type of data?
Machine Data

======================================================================================================================

Module 5 – Processing of Big Data :

Hadoop

Case Study – Scheduling on Hadoop by Oracle and about Lustre

Questions –

What is a method of storing data to support the analysis of originally disparate sources of data?
Data Lakes

Data Warehouses provide online analytic processing: True

What does ‘OLAP’ stand for?
Online Analytical Processing

===================================================================================================================

QUESTION 1 (1 point possible)
In Module 1: What is a common use of big data that is used by companies like Netflix, Spotify, Facebook and Amazon?
Recommendation Engines

QUESTION 2 (1 point possible)
In Module 2: Is one byte binary? False

QUESTION 3 (1 point possible)
In Module 2: What has highly contributed to the launch of the Big Data era?
Cloud Computing

QUESTION 4 (1 point possible)
Module 3: A data scientist is a person who is qualified to derive insights from data

by using skills and experience from computer science, business or science, and statistics: True

QUESTION 5 (1 point possible)
Module 3: ‘HDFS’ stands for ________?
Hadoop Distributed File System

QUESTION 6 (1 point possible)
Module 3: Data privacy is a critical part of the big data era. Businesses and individuals must give great thought to how data is _________________.
Collected, retained, used and disclosed

QUESTION 7 (1 point possible)
Module 5: In the Hadoop framework, a rack is a collection of __?
Nodes

QUESTION 8 (1 point possible)
Module 5: What is a method of storing data to support the analysis of originally disparate sources of data?
Data Lake

QUESTION 9 (1 point possible)
Module 5: The Hadoop framework is mostly written in the Java programming language. True

QUESTION 10 (1 point possible)
Module 5: What is the term referring to a database that must be processed by means other than just the SQL Query Language.
NoSQL

Introduce Yourself (Example Post)

This is an example post, originally published as part of Blogging University. Enroll in one of our ten programs, and start your blog right.

You’re going to publish a post today. Don’t worry about how your blog looks. Don’t worry if you haven’t given it a name yet, or you’re feeling overwhelmed. Just click the “New Post” button, and tell us why you’re here.

Why do this?

  • Because it gives new readers context. What are you about? Why should they read your blog?
  • Because it will help you focus you own ideas about your blog and what you’d like to do with it.

The post can be short or long, a personal intro to your life or a bloggy mission statement, a manifesto for the future or a simple outline of your the types of things you hope to publish.

To help you get started, here are a few questions:

  • Why are you blogging publicly, rather than keeping a personal journal?
  • What topics do you think you’ll write about?
  • Who would you love to connect with via your blog?
  • If you blog successfully throughout the next year, what would you hope to have accomplished?

You’re not locked into any of this; one of the wonderful things about blogs is how they constantly evolve as we learn, grow, and interact with one another — but it’s good to know where and why you started, and articulating your goals may just give you a few other post ideas.

Can’t think how to get started? Just write the first thing that pops into your head. Anne Lamott, author of a book on writing we love, says that you need to give yourself permission to write a “crappy first draft”. Anne makes a great point — just start writing, and worry about editing it later.

When you’re ready to publish, give your post three to five tags that describe your blog’s focus — writing, photography, fiction, parenting, food, cars, movies, sports, whatever. These tags will help others who care about your topics find you in the Reader. Make sure one of the tags is “zerotohero,” so other new bloggers can find you, too.

Design a site like this with WordPress.com
Get started