Junjie Zhang (张俊杰)

junjie.zhang@sri.utoronto.ca

I am currently a Data Scientist and Manager at TD Bank, working with a team of talented and energitic business insights and engineers to build machine learning and big data analytics in the business process to transform the future of banking.

Before TD, I was a research fellow at Sunnybrook Health Research Centre, Toronto, ON, where I carry out research on radiomics/quanitative imaging and lead the effort on building machine learning systems for computer-aided diagnosis of prostate cancer. I am also a Post-doctoral at Dept. of Medical Imaging, University of Toronto. Previously, I was a project research scientist at Industrial SkyWorks Inc., building computer vision and machine learning systems for automatic anomaly detection from UAV-based thermal imaging.

I obtained my Ph.D. in Geomatics Engineering at York University, advised by Prof. Gunho Sohn. During my graduate study I've worked/interned in both academic and industrial environments. I obtained a M.Sc. degree in Geoinformatics from University of Twente, the Netherlands. I also hold a B.Sc. and a M.Sc. degree from Wuhan University, China.

I host several open soucrce projects on Github as hobby. I authored SLICO#, a SLICO superpixel segmentation algorithm in C#, and MatIO.NET, a .Net library for reading and writing Matlab MAT-Files.

I do have a LinkedIn profile.

Research

I work on building machine learning systems and data analytical applications in a number of quantitative fields, including medical imaging, health informatics, computer vision, and geomatics engineering. I also have a background in probabilstic models, stochastic processes, image/signal processing, and geoinformation science.

I am passionate about applying my knowledge and skills in the fields of deep learning, natural language processing (NLP), quantitative finance and data science.

My current research topics include:

  • Machine intellegence for computer-aided diagnosis (CAD) of cancers.
  • Computational diffusion maganetic resonance imaging and Quantitative radiomics.
  • Statistical predictive models for cancer screening, risk stratification, and prognosis.
  • Deep learning in medical imaging and health informatics.

(Most recent projects to be added)

Selected Projects

Sparse Correlated Diffusion Imaging: A New Computational Diffusion MRI Modality for Prostate Cancer Detection
J. Zhang, F. Khalvati, M.A. Haider, A. Wong. [PDF] [Slide]
Research Themes: Computational diffusion MRI, Quantitative Imaging, Numeric Optimization

Computational diffusion MRI (CD-MRI) aims to leverage computational means to generate imagery from diffusion signals which are easier to interpret by human experts. We introduced a new CD-MRI modality called sparse correlated diffusion imaging (sCDI), which can improve the separability of cancerous and healthy tissue in prostate gland, thus the diagnostic accuracy of proaste cancer.

ProCanVAS: A Comprehensive Platform for Prostate Cancer Visualization and Analysis
J. Zhang, F. Khalvati, M.A. Haider. [Resources coming soon]
Research Themes: Computer-Aided Diagnosis (CAD), Machine Learning, Computer Vision, Radiomics

I led the developement of ProCanVAS at Sunnybrook Research Institute, Toronto, ON, Canada. The platform is a complete clinical decision support system. It integrates a range of imaging processing, computer vision and machine learning algorithms and provides modules for computational diffusional MRI, image contouring, radiomics feature extraction, and prostate cancer detection.

Predictive/Prognosis models based on Quantitative Radiomics and Clinic Data for Lung and Prostate Cancer
J. Zhang, Y. Zhang, F. Khalvati, M.A. Haider. [Resources coming soon]
Research Themes: Radiomics, Predictive Model, Feature Selection, Classification, data analysis

Radiomic features have been shown to provide prognostic value in predicting clinical outcomes in several studies. However, feature redundancy, unbalanced outcomes, and small sample sizes have led to relatively low predictive accuracy. The goal of the study is to explore different strategies for overcoming these challenges and improving the predictive performance of radiomics-based prognosis analysis for non-small cell lung cancer (NSCLC) and prostate cancer.

Stochastc Models for Object Detection from Remote Sensed Data
J. Zhang, G. Sohn. [Slide] [PDF] [Demo]
Research theme: Stochastic processes, Probabilistic Model, Markov Chain Monte Carlo (MCMC), Parameter Estimation, Model Optimization

It is marvelous that human can detect objects of variations with ease in complex scene, a challenge many existing machine vision systems fail to do. In the demo, we presented a stochastic model to detect single trees in forests from airborne laser scanning (ALS) data. The model integrates low-level image processing techniques into a high-level probabilistic framework. The configuration containing the best possible set of trees is estimated by a Markov Chain Monte Carlo (MCMC) dynamics coupled with a simulated annealing.

Derivation of forest vertical and horizontal information from large-footprint spaceborne LiDAR waveforms
J. Zhang, Y. Xing, A. de Gier. [PDF] [Slide] [Demo1] [Demo2]
Research theme: Full-waveform analysis, Feature Engineering, Regression Models, Support Vector Machine (SVM), ICESat, Spaceborne LiDAR, Forest

ICESat/GLAS is the first laser-ranging (lidar) instrument for continuous global observations of Earth put in space. We further explored its ability in deriving forest vertical and horizontal information in cool temperate forests. In this research, we first conceptualized a method to derive forest type information from large-footprint lidar waveform data.

Selected Publications [Full List]

Links to: [Google Scholar] [ResearchGate]

Bag of Bags: Nested Multi-Instance Classification for Prostate Cancer Detection
F. Khalvati, J. Zhang, A. Wong, M.A. Haider. IEEE ICMLA 2016. [PDF]

In cancer screening, the foremost problem to solve is whether a patient has cancer, regardless of the location of cancerous regions in the organ. In machine learning, this problem has been formulated as multi-instance learning (MIL) where bags of instances are classified rather than the individual instances. In this paper, we propose a bag of bags (BoB) nested MIL algorithm to first detect which patients have cancer and consequently, which slices in the 3D volume imaging data of the detected patients contain cancerous regions.

A Local ROI-specific Atlas-based Segmentation of Prostate Gland and Transitional Zone in Diffusion MRI.
J. Zhang, S. Baig, A. Wong, M.A. Haider, F. Khalvati. CVIS 2016 [PDF] [Poster]

We propose a semi-automatic local ROI-specific atlas-based segmentation (LABS) method to segment prostate gland and transitional zone in diffusion magnetic resonance images. The algorithm has been implemented in ProCanVAS.

Superpixel-based Prostate Cancer Detection from Diffusion Magnetic Resonance Imaging
J. Zhang, F. Khalvati, A. Wong, M.A. Haider. CVIS 2015 [PDF] [Slide]

This paper presents a superpixel-based approach to detect prostate cancers from diffusion magnetic resonance imaging (dMRI). In this approach, superpixel generated candidate regions are incorporated in the quantitative radiomics model to detect prostate cancers from dMRI modalities.

Thermal Infrared Inspection of Roof Insulation Using Unmanned Aerial Vehicles
J. Zhang, J. Jung, G. Sohn, M. Cohen. UAV-g 2015 [PDF] [Calibration Result]

We presented in this study a relative thermographic calibration algorithm and a superpixel Markov Random Field model to address problems in thermal infrared inspection of roof insulation using UAVs.

Full waveform-based analysis for forest type information derivation from large footprint spaceborne lidar data
J. Zhang, A. de Gier, G. Sohn, Y. Xing. PE & RS, 77:3, 281-290, 2011 [PDF] [Slide]

We are the first to conceptualize a new method to derive forest type information from large-footprint lidar data based on full waveform analysis.

An improved method for estimating forest canopy height using ICESat-GLAS full waveform data over sloping terrain
Y. Xing, A. de Gier, J. Zhang, L. Wang. [PDF] [Demo]

We presented an improved model which reduces the mixed effects caused by both sloping terrain and rough land surface, and makes a significant improvement for accurately estimating maximum canopy height over sloping terrain from ICESat-GLAS full waveform data.

Software

SLICO#: SLICO Superpixel Segmentation in C# (.NET)

SLICO# is a C# implementation of the SLICO superpixel segmentation algorithm by Achanta et al. [1]. It is a standalone program and can be compiled into a .Net library. More information about SLICO superpixel segmenation can be found here.
[1] Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Susstrunk. "SLIC Superpixels Compated to State-of-the-art Superpixel Methods", IEEE TPAMI, November2012.

MatIO.NET: A .NET Library for Reading and Writing Matlab MAT-Files

The MatNETIO software contains a library for reading and writing MATLAB MAT files. The MatNETIO library (MatNETIO) is the primary interface for creating/opening MAT files, and writing/reading variables. More information about Matlab MAT-files format can be found here.