Principles+ 2

Foundations in Data Science

AGES 17 - 19
  • Students should have completed the Python Intensive course with us.
  • Students with prior Python experience can contact us here to get assessed prior to signing up for this course.
Course Overview:
Data Science involves using computers to extract insights from data, enabling us to make better decisions in every imaginable situation. When done right, data science is the foundation for many of the cutting edge technologies emerging around us today like self-driving cars, chatbots, robo-trading systems and world-champion beating chess AIs.
As a discipline, Data Science lies at the intersection of Computer Science and Statistics. Our offering features small class sizes with an instructor-to-student ratio of 1:5. Instructors hold Masters and PhD degrees in engineering and statistics from world-class universities including Imperial College, Stanford and UC Berkeley and have worked in quantitative, data driven roles at top research institutions and Fortune 500 companies. Instructors are full-time at SG Code Campus and have had extensive experience teaching coding in a small classroom setting - each has individually taught coding to more than 100 unique Secondary School and Pre-U students in 2018, delivering a range of theoretical and applied courses in C++, Java, JavaScript & Python.
This course is perfect for post O-level, post A-level, Integrated Programme (IP) or International Baccalaureate (IB) students who have Python programming experience interested in how to gain insights from data, make data-driven decisions, and understand the math and code behind artificial intelligence technologies.
  1. Summary Statistics with numpy package
    • Introduction to data science, data analysis and machine learning
    • Introduction to the different categories of data
    • Process data using NumPy - a scientific computation library in Python
  2. Data wrangling with the pandas package
    • Introduction to the Pandas library in Python - a set of tools for fast data transformation
    • Techniques to clean and process raw data into a form that is suitable for data analysis
    • Introduction to Matplotlib and Seaborn - a set of plotting tools to convert data into a more visual, intuitive form
  3. Regression and Classification with the scikit-learn package
    • Introduction to the statistical modeling with Linear and Logistic Regression
    • Use-cases with linear/logistic regression
    • Evaluation of prediction results with confidence intervals
  4. Hyper-parameter tuning
    • The bias-variance tradeoff
    • Parameter tuning with a simple holdout set
    • Explore different metrics for tuning parameters in varying contexts: regressions v.s. classification problems
  5. Model Selection
    • Alternative models for classification and regression: N-nearest neighbours, regularised Linear Models, CART, Random Forests, Boosted Decision Trees, Naive Bayes, Support Vector Machines, Splines
  6. Practical Data Science
    • Explore real life machine learning and data science examples using Kaggle