A minimalist histopathology image analysis dataset (MHIST)

MNIST is one of the most popular benchmarking datasets for machine learning practitioners. Until now, due to a variety of reasons, no such standardized dataset existed for histopathological images.

On January 29th, 2021, the researchers from Dartmouth College and Dartmouth-Hitchcock Medical Center posted a paper on arXiv.org preprint depository describing a new MHIST dataset with basic benchmarking data. They made the MHIST dataset freely available to the ML community on their GitHub site.

MHIST is a binary classification dataset of 3,152 fixed-size (224 x 224 pixels) images of colorectal polyps, each with a…


UPDATE (12/12/20): RTX 2080Ti is still faster for larger datasets and models!

Photo by Joey Banks on Unsplash

The two most popular deep-learning frameworks are TensorFlow and PyTorch. Both of them support NVIDIA GPU acceleration via the CUDA toolkit. Since Apple doesn’t support NVIDIA GPUs, until now, Apple users were left with machine learning (ML) on CPU only, which markedly limited the speed of training ML models.

With Macs powered by the new M1 chip, and the ML Compute framework available in macOS Big Sur, neural networks can now be trained right on the Macs with a massive performance improvement.

According to the recent Apple blog:


No coding machine learning in practice, part 2

Lung and colon adenocarcinoma are some of the most common cancers affecting numerous patients throughout the world. They frequently spread to other sites of the body. It is not uncommon that the pathologist is faced with the question if biopsy showing adenocarcinoma originated from lung or colon primary site. Pathologists are often forced to use special stains to help them make this determination.

I used two different machine learning libraries (fastai and Keras) to solve metastatic adenocarcinoma origin in one post and used Lobe application in another. …


No coding machine learning in practice, part 1

Machine learning (ML) has the potential for numerous applications in the health care field. One promising application is in the area of anatomic pathology. ML allows representative images to be used to train a computer to recognize patterns from labeled photographs. Based on a set of images selected to represent a specific tissue or disease process, the computer can be trained to evaluate and recognize new and unique images from patients and render a diagnosis.

Lung and colon adenocarcinoma are some of the most common cancers affecting numerous patients throughout the world…


Photo by Angiola Harry on Unsplash

Breast cancer is the most common cancer in the United States and the second leading cause of cancer death after lung cancer (2017 data). Accurate diagnosis is imperative for proper breast cancer management, including treatment selection. Pathologists evaluate the lesion’s overall architecture and morphological features like nucleus to cytoplasm ratio and various cytoplasmic and nuclear features. The morphological features of nuclei are essential in making the cancer diagnosis.

Thanks to the researchers from the University of Wisconsin, we have a great dataset of nuclear features from benign and malignant cells. The breast cancer dataset is included in the sklearn.datasets library…


The easy way to XGBoost parameter optimization

Photo by Robina Weermeijer on Unsplash

I predicted heart failure using Random Forest, XGBoost, Neural Network, and an ensemble of models in my previous article. In this post, I would like to go over XGBoost parameter optimization to increase the model’s accuracy.

According to the official XGBoost website, XGBoost is defined as an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. It implements machine learning algorithms under the gradient boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way.


Random Forrest vs XGBoost vs fastai Neural Network

Photo by Robina Weermeijer on Unsplash

Every year cardiovascular diseases kill millions of people worldwide. They are disorders of heart and blood vessels and are divided into heart attacks (caused by occlusion of heart vessels), strokes (caused by occlusion or breakages of brain vessels), and heart failure (caused by the inability of the heart to pump enough blood to the body). Since severe heart failure can lead to patient death, it is very important to predict it in advance based on the patient’s clinical and laboratory data.

To assess if I can make such predictions, I used the…


ML Model Deployment with fastai and Binder

Centipedes and millipedes are two creatures that one can encounter in the garden. Centipedes are beneficial to you because they feed on mites, insect larvae, insects, baby snails, and slugs. Millipedes, on the other hand, can be harmful and feed on plant roots, germinating seeds, and seedlings. Therefore it is beneficial to any gardener to be able to distinguish between the two.

Being inspired by the beautiful book “Deep Learning for Coders with fastai and PyTorch: AI Applications Without a PhD” by Jeremy Howard and Sylvain Gugger, I decided to make a simple…


Keras vs fastai

Machine learning (ML) has the potential for numerous applications in the health care field. One promising application is in the area of anatomic pathology. ML allows representative images to be used to train a computer to recognize patterns from labeled photographs. Based on a set of images selected to represent a specific tissue or disease process, the computer can be trained to evaluate and recognize new and unique images from patients and render a diagnosis.

Lung and colon adenocarcinoma are some of the most common cancers affecting numerous patients throughout the world. They frequently spread to other sites of the…


Introduction

The COVID-19 pandemic, caused by the novel coronavirus SARS-CoV2, is rapidly spreading across the world and creating serious health and economic issues. Although the gold standard for diagnosing the COVID-19 disease is a polymerase chain reaction (PCR) assay, a chest x-ray may be a valuable addition to the diagnostic toolbox.
Using the COVID-Net Open Source Initiative dataset, I trained three machine learning models to differentiated between chest x-ray findings from COVID-19 disease, other non-COVID-19 pneumonia, and healthy lung. The jupyter notebooks for this project are available on my Github page.

Dataset

103 images of COVID-19, 500 images of non-COVID-19…

Andrew A Borkowski

Pathologist and Deep Learning Enthusiast

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store