Chee Yee Lim

Data Science Team Lead

Featured portfolio update

New dashboard to help you plan for FIRE (Financial Independence Retire Early) in Singapore.

Featured blog post

Self reflection on the importance of +1 mindset for my professional career.

Featured code template

Run time series forecasting using deep learning-based models from pytorch-forecasting.

Portfolio

A selection of dashboards built using Python Dash and public data, with a sprinkle of machine learning.

Singapore FIRE Planner

Dashboard to help you plan for FIRE (Financial Independence Retire Early).

Open Dashboard

Last updated on Dec 2023

Singapore Resale HDB Analytics

Dashboard to understand the pricing history of HDB resale flats.

Open Dashboard

Last updated on June 2023

Articles

A selection of articles on everything data science.

Blog Posts

Blog posts to share my data science/machine learning journey

Code Templates

Code templates that I have written to handle common data science/machine learning tasks

Data Science Manager

July 2023 - Present

Johnson & Johnson, Singapore

Manages a team of data scientists in Global Finance to deliver on various forecasting projects.

Data Science Manager

June 2022 - June 2023

Senior Data Scientist

Sept 2020 - June 2022

DHL Consulting, Singapore

Managed a team of 3 data scientists both as a team lead and a project manager with a focus on empowering them to deliver business values using data science.
Completed 8 data science projects (5 of which are in production) that solve business problems for clients, ranging from fraud detection to customer retention.
Delivered an estimated total annual savings of SGD 860,000 while achieving an average customer satisfaction Net Promoter Score of 68.8 via the projects.
Partnered closely with business stakeholders to jointly define problem statements and align priorities, so that projects can be delivered on time and within budget while satisfying clients’ needs.
Led business development meetings and workshops with cross-divisional regional or country heads that brought in 5 data science projects, totalling SGD 570,000 in revenue.
Communicated complex technical topics to business audiences using analogies to obtain buy-ins, which result in stronger confidence and engagement with our data capability.
An example project delivered is a customer service analytics data product that highlights trending topics and associated KPIs (e.g. sentiments) from incoming customer chats, emails and calls using Natural Language Processing (NLP), resulting in shorter response time, lower staffing cost and improved service quality for our client.

Data Scientist

July 2019 - Sept 2020

PatSnap, Singapore

Developed a random forest-based model for patent value prediction by integrating novel NLP-based metrics extracted from patent texts with traditional patent indicators.
Deployed the random forest-based model into production in 2 forms: (1) as a dockerised Flask API for generating real-time predictions on new data, and (2) as a PySpark pipeline for generating batch predictions on historical data.
Worked closely with a team of 3 product managers and 2 engineers to ensure the product is developed on time while achieving business goals and fitting into existing IT infrastructure.
The patent value product replaced a third-party patent value data provider, which helps our company saves $100,000 per year in subscription fee.

Data Scientist

July 2019 - Sept 2020

Schroders, UK

Led the development of the human capital data product to provide summary insights into the board director relationships and career histories for 20,000+ public companies globally.
Engineered the backend ETL using distributed processing in Spark and the frontend using R Shiny dashboard.
The final product was perceived as 'a distinct value-add and massive time saver' by the heads of 3 investment research teams who requested their analysts to use the product as part of their process.
Liaised with 9 data vendors and verified the quality of alternative data by checking their data collection and processing methodology, as well as comparing the data with known information.

PhD Researcher

October 2013 - September 2017

University of Cambridge, UK

Used machine learning techniques to study how stem cells make developmental decisions by analysing terabytes of time-series single-cell expression data.
Reconstructed the development timeline with a polynomial model fitted to a kernel PCA-reduced space, which enables the subsequent inference of potential causal relationships among genes using penalised vector autoregression and Boolean models.
The research resulted in 2 research papers, one of which is a first-authored paper.
Commended by at least 3 senior researchers on my public speaking ability in presenting complex technical terms clearly and passionately.

Education

University of Cambridge, UK

October 2013 - September 2017

PhD in Computational Biology

Graduated on-time with 2 research papers and presented a poster at the ISMB conference.
Tutored for 2 bioinformatics courses and led the Wolfson College Table Tennis team.

University of Edinburgh, UK

September 2010 - June 2013

BSc (Hons) in Genetics

Achieved 1st class despite skipping the first year of study via direct entry to the second year.
Represented 30 students as a class representative and voiced out concerns affecting students.

Skills

Programming Languages & Tools

Python
R
SQL
Big Data Processing - Apache Spark
Natural Language Processing - HuggingFace, spaCy
Visualisation - seaborn, plotly, Tableau, PowerBI

Git
Docker
Cloud Computing - DigitalOcean, Azure, AWS
Web Development - HTML, CSS, Javascript
Search Framework - Apache Solr

Machine Learning & Statistical Methods

Exploratory Data Analysis
Natural Language Processing
Time-series Data Analysis
Traditional Machine Learning
Statistical Analysis

Human Languages

English - Fluent
Mandarin - Native
Malay - Intermediate
Cantonese - Conversational

Achievements

Certifications

Google Cloud Trainings ( View)
Passed CFA level 1
Investment Management Certificate

Publications

Distinct molecular trajectories converge to induce naive pluripotency. Cell Stem Cell September 2019. ( View)
Understanding transcriptional regulation through computational analysis of single-cell transcriptomics. Doctoral thesis September 2017. ( View)
BTR: training asynchronous Boolean models using single-cell expression data. BMC Bioinformatics September 2016. ( View)

Interests

Table tennis

Enjoy the intense focus required and rapid adaptability nature of table tennis in competitions.
Meetup events

Attend Meetup events regularly to keep up with the latest development, and to network with people from different backgrounds.
Travel

Enjoy hiking in nature and experiencing different culture in a foreign country.

Chee Yee Lim

Featured portfolio update

Featured blog post

Featured code template

Portfolio

Singapore FIRE Planner

Singapore Resale HDB Analytics

Articles

Blog Posts

Code Templates

Experience

Data Science Manager

Data Science Manager

Senior Data Scientist

Data Scientist

Data Scientist

PhD Researcher

Education

University of Cambridge, UK

University of Edinburgh, UK

Skills

Achievements

Interests