Intended Audience:
Who can be a Data Scientist?
- IT professionals or software developers looking for a Career in data science and business analytics
- Professionals who are currently working in data and business analytics
- Computer Science Graduates
- Financial and Business Analysts
- University & College Graduates/students looking for a career as a data scientist
- Anyone with Statistics background
- Research Associates
If you have an intellectual curiosity, you can become a Data Scientist!
Overview (Insights of Data Science – with R Programming and Machine Learning & Analytical Tools of Data Science:
- Understanding Data Science
- Getting tools ready
- R programming basics
- Exploratory data analysis
- Statistical inference
- Unsupervised Learning
- Supervised Learning
- Technology and Tools for Unstructured Data
Live Projects: Capstone Project, the duration is 1 month
Course Objectives:
The aim of Data Science is to turn data into information and information into insight. It is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms. This course serves as an introduction to the Data Science principles required to tackle real-world, data-rich problems in across businesses and industries worldwide.
- Knowledge in a statistical, mathematical, or technical field
- Understanding of programming concepts
Insights of Data Science with R
Session 1: A walk-through in the world of Data Science
- A Handshake with Big Data and Data Science
- Data Analytical Life Cycle
- A use-case discussion
Session 2: Introduction to R Language
- R BASICS & FUNCTIONS
- Statistical Analysis in R
- Jupyter Notebooks
- Introduction to Statistics – The R Environment
- Variables in R, Vectors and Matrices in R
- Graphs for Categorical Data – Graphs for Quantitative Data – Lists and Arrays in R
- Data Frames in R
- Measures of Center & Measures of Variability
- Numerical Measures for Quantitative Bivariate Data – Functions in R
- Probability and Probability Distributions – Control Statements and Loops in R
- Discrete Distributions in R
- Plotting tools in R
- The Normal Probability Distribution – Applications in R
- Sampling Distributions – Central Limit Theorem
- Pearson Correlation &Visualizing Correlation in R
Session 3: Exploratory Analysis in R
- Data cleaning the Dirty Data
- Dealing with missing values
- Advanced data handling
- Basic Plotting of Data & Outlier Detection
- Large-Sample Tests of Hypotheses
- Inference from Small Samples Lab
- Applications of t-tests & Chi-Squared tests in R
- One-way analysis of variance & Two-way analysis of variance Lab
- Applications of ANOVA in R
- The Wilcoxon rank-sum test &The Wilcoxon signed-rank
- Kruskal-Wallis test and Friedman test
- Spearman rank correlation coefficient
- Dimensionality reduction
- Cross-validation
- Labs and assignments
Machine Learning and Analytical Tools
Session 1: Advanced Analytical Machine Learning Methods – Unsupervised Learning
- Overview of Clustering
- Understanding the k-means algorithm
- Using K-means in R
- Use cases
- Lab work and assignments
Session 2: Association Rule Mining
- Apriori Algorithm
- Evaluation of Candidate Rules
- Applications of Association Rules
- Use Case: Transactions in a Grocery Store
- Rule Generation and Visualization
- Validation and Testing
- Lab and assignments
Session 3: Supervised Learning
- Understanding Supervised Learning – Decision Trees building in R
- Text Analysis, Text Classification
- Lab exercise
Session 4: Advanced analytics –Technology and tools
Analytics for unstructured data:
- Use case Map-reduce
- Apache Hadoop
- Spark
- Pig
- Hive
- NoSQL