Objectives & Outcome
By the end of the course, participants will be able to:
- Use Python to wrangle, analyse and visualise data
- Understand and apply key statistical concepts to do predictive model projects
Agenda
Session 1: Overview of Python and Basic Code (4 hours)
- Purpose: Introduction to the Python programming language, which is one of the most widely used tools for statistics and data science
- Content: (1) Overview of Python and the Python IDE, (2) Installing and loading packages, (3) Working Directory, (4) Creating a project, (5) Basic code
Session 2: Data Wrangling (8 hours)
- Purpose: It has been stated that up to 80% of data analysis is spent on the process of cleaning and preparing data. This session will guide learners through the data-wrangling process, along with giving a solid foundation of the basics of working with data
- Content:
(1) Dealing with numbers:
+ Definition of Integer - Double
+ Generating sequences of non-random numbers, or of random numbers
+ Setting the seed, rounding numbers
(2) Dealing with strings: string basics, string manipulation
(3) Dealing with factors: creating, converting and inspecting, ordering level, revaluing level
(4) Dealing with dates:
+ Converting strings to dates
+ Creating date sequences
(5) Managing data frames, missing values, outlier values, import data and loading data
(6) Reshaping data:
+ Converting Long to Wide and vice versa
+ Splitting a single column into multiple columns and vice versa
(7) Transforming data: Select, Filter, Grouping, Performing statistic summary, Arrange, Join
(8) Report
(9) Exercise
Session 3: Data Visualisation (8 hours)
- Purpose: This session shows learners how to create the most common types of graphics and then shows if the graphs meet the needs. Most of the recipes use the Matplotlib library, a powerful and flexible way to make graphs in Python
- Content: (1) Layer & Grammar of Graphics, (2) Bar Graphs, (3) Line Graphs, (4) Scatter Graphs, (5) Summarised data distribution, (6) Annotation, (7) Axes, (8) Controlling the overall appearance of graphs, (9) Legend, (10) Facet, (11) Colour, (12) Advanced Graphs, (13) Exercise
Session 4: Statistical Learning Overview (4 hours)
- Purpose: This session introduces learners to an overview of statistical learning, including assumptions and theories
- Content: (1) Introduction about statistical learning: Trade-off between accuracy and model interpretability, (2) Assessing model accuracy: Bias - Variance trade-off, Measuring, (3) Linear Regression Theory and Exercise
Session 5: Logistic Regression (4 hours)
- Purpose: This session shows learners how to apply Logistic Regression in the real world, and provides the concept of sampling data
- Content: (1) Logistic Regression Theory and Exercise, (2) Resampling method: Cross-Validation & Bootstrap
Session 6: Tree-based method (8 hours)
- Purpose: This session introduces learners how to apply tree-based methods to predict a target business model
- Content: (1) Basics of Decision Trees, (2) Bagging, Random Forest, Boosting Theory and Exercise
Session 7: Unsupervised Learning (8 hours)
- Purpose: This session introduces learners to understanding the concept of Unsupervised Learning and its application in banking
- Content: (1) PCA, K-Mean and Hierarchical Clustering Theory and Exercise, (2) Market Basket Analysis Theory and Exercise, (3) RFM Analysis Theory and Exercise
Final Project Completion (4 hours)
- Purpose: The final session for learners to handle a complete project
- Content: Competition - Start a predictive modelling project from scratch