 # R Programming for Data Science

• ## Online & Offline

Sale Date Ended

INR 9000
Sold Out

#### Invite friends

Page Views : 26

COURSE DESCRIPTION

This course will embark your journey into the beautiful, exiting, artistic, logical and rewarding world of data science. In order to explore data science, we are going to use R programming language. R is an open source software for mathematical and statistical modeling. Learning R and statistical modeling will, in one hand, improves data science problem solving skill, on the other, increase the employability. This is a very good course to move from beginner to medium stage. This course is weekend online and offline course, offline classes are held at Bangalore while online classes will be conducted on Webex. This is 24 hour course distributed between 4 days of 6 hours each i.e. classes will be held on 2 consecutive weekends.

## OBJECTIVE

Basic Introduction to R

1. Introduction to R
2. Drawback of using R

Getting help

1. help()
2. Mailing List
3. R Web Page
4. ? Operator
5. ?? Operator
6. Hands on Exercise

Structure of program in R

1. Using R console
2. Scripting in R

Packages:

1. Type of packages
2. Introduction to R base Packages
3. Introduction User Created Package
4. Brief introduction to some user created packages
5. Package Installation
6. Hands on Exercise

Basic Data type

1. Integer
2. Numeric
3. Character
4. Logical
5. Complex
6. Special data type

1. Vector
2. List
3. Matrices
4. Array
5. Table
6. Data Frame
7. Naming row and column of data frame and matrix
8. Hand on Exercise

Loops and conditional

1. Use of loop and conditionals
2. Structure of conditionals
3. if statement
4. if, else statement
5. if ,else if , else statement
6. while loop
7. for loop
8. Repeat
9. Hand on Exercise

1. apply()
2. sapply()
3. laaply()
4. tapply()
5. by()
6. plyr packages
7. xxply functions
8. Hands on exercise

Text  Manipulation in R

1. Basic of String
2. Regular Expression in R
• sub()
• gsub()
• grep()
• substr()
• strsplit()
• regexpr()
• gregexpr()

3. Hands on Exercise

Function in R

1. Introduction to function in R
2. Structure of function
3. Returning a value from a function
4. Returning complex data type from a function
5. Recursion
6. Hands on exercise

Some mathematical functions

1. Finding minimum maximum
2. Trigonometric function
3. Exponential function
4. Logarithm calculation
5. Finding absolute value
6. Factorial function
7. Cumulative mathematical functions
8. Operation Sets
9. pmin()
10. pmax()
• round()
• floor()
• ceiling ()
• sqrt()

Assignment Solving

Data aggregation and joining in R :

1. Introduction to data aggregation.
2. Introduction to data joining
3. Aggregation functions.
4. Data joining functions in R

Graphics in R :

1. Use of graphs and chart
2. Basic elements of graph
3. Graphics in R base package
• par()
• plot()

4. Basic elements of graph generation

5. ggplot2 package

6. Grammar of graphics

7. Layered structure of ggplot2

8. Basic elements of ggplot2

• qplot()
• ggplot()

9. Some chart use and creation with Base R and ggplot2 package

• Bar chart
• Stacked Bar Chart
• Histogram
• Scatter plot
• bubble chart
• Pie chart
• quantile quantile plot
• Box Plot
• Area Plot
• Multiple plots
• Line graph (Time Series Plotting)

10. Writing plot to files

11. Hands on Exercise

R connection with Database

1. Introduction to RDBMS
2. Introduction to MySql
3. R packages to connect to database
4. Data analysis of data from database
5. Hands on Exercise

Debugging in R

1. Introduction to Debugging
2. Some useful function to debug
3. browser()
4. debug()
5. undebug()
6. debugonce()
7. trace()
8. untrace()
9. setBreakPoint()
10. Hands On Exercise

A simple project on the basis of data joining, charting and Aggregation.

Data Preprocessing in R

1. Authenticity of Data.
2. Data Filtering.
3. Missing data imputation.
4. Data Merging
5. Data aggregation.
6. Data transformation.

Characteristics of Statistical Problems :

1. Importance of mean.
2. Population and Sample.
3. Hypothesis and use cases.

Introduction to probability :

1. Introduction to classical probability.
2. Rule of classical probability.
3. Probability distributions
4. Discrete and continuous Probability distributions.
5. Generation of Random numbers.
6. Fundamental of Population and samples.
7. Characteristics of Samples.

Confidence Interval :

1. Introduction to confidence Interval.
2. Introduction to significance level.
3. Calculation of confidence interval for mean
4. Calculation of confidence interval for variance.

Hypothesis testing :

1. Introduction to Hypothesis Testing and confidence interval.
2. Introduction to Null and Alternative hypothesis.
3. One sided and two sided hypothesis.
4. Introduction to critical region.
5. Introduction to significance level.
6. Introduction to P-values.
7. Steps in hypothesis testing.
8. Hypothesis testing of one sample mean :
1. z-test
2. t-test
9. Hypothesis testing for two sample mean:
1. z-test
2. t-test
3. Paired t-test
10. Hypothesis testing of one sample variance:
11. Hypothesis testing of two  sample variance
12. Chi-squre test for :
1. Independence.
2. Goodness of fit.

A simple project on hypothesis testing.

This course is going to be prerequisite for many upcoming courses like SparkR, Machine learning with R, Bayesian Network analysis in R, Deep learning with R and many more…..

Why to learn R ?

Data science has emerged to rule the world for many years.  The speed data science is percolating to the different segment of business, academic and research, I am in view that, at some point or other everyone will be involved in executing some or other sort of data analysis using some tool.

In order to do any sort of analysis, R is very good tool. It is having more than 8000 packages to perform different sort of analysis.  All the major Big Data frameworks are having an interface to R like Hadoop is having R Hadoop and Apache Spark is inbuilt with Rspark.

Best part of R is a open source platform. Just download it, install it, offcourse it is very easy to install and start learning. Very soon you might be hired by some data analysis company or you might be persuing your high level education somewhere or at least you are using data of your organization and coming of many insight to get every one amazed. So why to be late?

Projects  :

There will be three projects, which will move end to end .

Project1 : Given sells data, participants has to implement data science day to day algorithm like filtering, aggregation, date and time manipulation and applying charts to understand patterns in sells of different stores.

Project 2 : Given Movie lens data, participant has to implement data joining, aggregation and charting algorithms to find meaningful patterns and informations.

Project 3 : Given Kaggle titanic data, participants have to implement  data preprocessing and data cleaning algorithms. After that, participants are required to do hypothesis testing on the data to validate their hypothesis.