Who would believe that so small a space could contain the images of all the universe? O mighty process… what talent can avail to penetrate a nature such as this? What tongue will it be that can unfold so great a wonder? Truly none!
Leonardo da Vinci broke into this rhapsodic praise for the human eye after having copiously documented his experiments. On the substance, shape, and function of the eye. The Renaissance artist’s memorable observation from his theory of vision—‘How the eye has no share in the creation of the colours of the rainbow’—illuminated many a scholarly work that followed. Johann Wolfgang von Goethe’s Theory of Colour and, much later, Ewald Hering’s Theory of Colour Vision leaned on da Vinci. His observations are even considered a precursor to Newton, whose theory of the spectrum of white light postulated that light is made up of seven colours.
Before da Vinci, philosophers like Plato and Aristotle, and prior to them, Alcmaeon and Democritus, all studied the eye and proposed advanced theories of vision. The eye, popularly a window to the soul for poets and writers, was clearly a window to scientific exploration as well.
More than five centuries later, the human eye has once again become a window. This time for artificial intelligence (AI), with neural networks seeking out patterns when fed with data. Such networks have been in use for decades, but lately, as if by a global conspiracy of breakthroughs, they have led to an explosion of applications. Image recognition (Facebook recognises faces almost as well as humans), self-driving cars, Go-playing bots, surveillance and many more. AI is on a roll. And since eyes have always been amenable to photography, their images have become data troves for algorithms looking for patterns and signs of diseases.
In April, the American drug regulator approved the first ever AI-based device, IDx-DR, for the automatic diagnosis of diabetic retinopathy. It’s the first significant application of AI in healthcare, having crossed all regulatory hurdles to reach the clinic. It can diagnose more than a mild level of the disease, which affects over 125 million people worldwide. Several more eye diagnostics are in the works, from startups, large tech companies, academics, and care providers.
All this action has changed the discourse. From the front of the eye to the back of the eye, from the more visible and high-volume burden of cataract to peering into the retina. That’s because the orange orb that lies at the back of the eye is the only place in the body where one can look at blood vessels in their naked form. Naturally, many are looking.
So, what is it that algorithm-aided cameras can now tell us?
What’s wrong with your eye, heart, blood pressure, kidney, foot, brain. It can tell your age and gender. Even one’s body mass index. If you are a smoker or not. Or whether you are optimistic, curious, agreeable, conscientious, extroverted, or maybe even neurotic.
Sensors integrated with AI study pupil dilation and predict facial (micro) expressions. It’s the end of poker face, you could say.
All this is true and wades into pop science. More seriously, in eye care, federalised AI on smartphones looks very real. Much like Google maps that people have come to use even offline.
Still, so far, says Kaushal Solanki, founder and chief executive of Eyenuk Inc., big companies haven’t moved close to regulatory approvals. But, he agrees, their presence certainly evangelised AI in eye care. They made it popular; you can hear the buzz, he says, referring to Microsoft, Google and IBM’s initiatives.
From eye images, algorithms can tell your age and gender. Even one’s body mass index. If you are a smoker or not. Or whether you are optimistic, curious, agreeable, conscientious, extroverted, or maybe even neurotic
Deep learning in the retina
“Retina, in some respects, is an extension of the human brain,” says Sameer Trikha, founder and chief medical officer of Visulytix Ltd, a London startup building AI tools for eye care. Nerve tissue is visible at the back of the eye and algorithms can identify changes here years before diseases such as, say, Alzheimer’s, Parkinson’s or multiple sclerosis manifest elsewhere.
Can we identify patients who are at risk of these diseases?
Yes, we can.
“The best thing about AI is that you don’t need to know relationships, it finds those relationships,” says Trikha. As datasets get cross-linked, from genetic information to epidemiology trends, algorithms find associations, a feature that is getting better with each passing year.
The practical use of AI didn’t really take off until the early 2010s. It was the time an annual computer vision competition, ImageNet Challenge, was gaining traction. Under it, a vast visual dataset was thrown open to developers to see which algorithms could identify objects in the dataset images with the lowest error rate. Two years into the competition, in 2012, its final results stunned the AI world. Winners from the University of Toronto had built a neural network architecture called AlexNet which beat the field by a whopping margin, with an error rate lower than a human’s. (Since then AlexNet has become a popular neural network to categorise photographs, to tell whether they are of cats, dogs, objects, or people.)
In no time, ImageNet, with millions of categorised images, had become a benchmark to evaluate image classification.
More interestingly, people discovered that their algorithms worked better if they trained them on ImageNet datasets. Naturally, people across the world began using it to jumpstart other recognition tasks, and AI startups proliferated.
The ImageNet Challenge ended in 2017. Its founder Fei-Fei Li, a Stanford University professor, is now the chief scientist at Google Cloud. But in just seven years, the accuracy of ImageNet-winning algorithms at classifying objects in the dataset rose from 71.8% to 97.3%, surpassing human accuracy rate of 94.9%. The competition convincingly showed that bigger datasets indeed led to better and more accurate decisions in the real world.
In 2017, just when ImageNet ended, Google threw open its library for serving machine learning models. Its library has served not only its own TensorFlow framework but third party models as well. “Google has been a trailblazer,” says Trikha. It is a giant, he says, which has the ability to build infrastructure and popularise AI.
Ophthalmology and other crazy words
As a speciality, ophthalmology is data and image dependent. And as a disease burden, most eye disorders impact hundreds of millions. Google and others chose to focus on diabetic retinopathy for obvious reasons. It is the most common cause of vision loss among diabetics—again, a highly prevalent lifestyle disease—as high levels of blood sugar lead to damage in retinal blood vessels.
Diabetic retinopathy prevalence may vary—18% in urban India, 10% in rural India, about 33% in the US—but once the disease strikes, it manifests in exactly the same way. It doesn’t matter whether the person is a Caucasian or Indian, says Dr Rajiv Raman, a consultant surgeon at Sankara Nethralaya. This is not exactly the case in cancer or cardiology.
Machine learning has been used for automated classification of eye diseases for more than a decade. Dr Raman, who worked with IIT Madras on this, says the earlier approach was piecemeal—looking for a red lesion here, a yellow lesion there. It was called ‘feature learning’, which computed explicit features as specified by experts.
At Aravind Eye Hospitals, its chief medical officer Dr R Kim says, the hospital group had developed their own software for diabetic retinopathy which would grade the disease based on the doctors’ inputs. “Because retinopathy cases are identified only late stage, our idea was in the doctor’s clinic, the patient could get one more test done, of the eye. We called it ‘opportunistic screening’,” says Dr Kim. Out of 100 diabetics, about 20 will have some problem in the eye, of which five would be referred to an eye doctor for urgent care and the remaining 15 would need close follow-up. Doctors don’t know who of these 100 would need urgent care, so screening is important. “At least you catch early. We call this preventing ‘needless blindness’.”
Noble intentions no doubt, but the specificity of the software in such places wasn’t great. The WHO guidelines for using AI tools for screening requires sensitivity (how often it detects the disease correctly) and specificity (how often it correctly identifies patients without the disease) greater than 80%. Hence, none of the early devices, including Aravind’s, could be used.
Ever since deep learning algorithms arrived, things have sped up. Both Dr Raman and Dr Kim collaborate with Google as part of their respective institutional tie-ups with the tech giant. And one of the things they, like practitioners elsewhere, are happy about is the consistency that AI brings. “If you show me an image today and then five days later, I, as a doctor, could give you different readings. An error can creep in. That kind of error is unlikely in AI. The more you train, the better it gets,” says Dr Kim.
“I, as a human doctor could give you different readings 5 days apart. Error can creep in. That kind of error is unlikely in AI”Dr R Kim
Ophthalmologists were farsighted
Back in the 1960s, a group of doctors met in Virginia, USA, to discuss diabetic retinopathy. One of the outcomes of that meeting was a standard classification of the disease. Over the next few years, the classification evolved as large clinical studies were done. Grading centres emerged. These would objectively grade images of the fundus—the part of the inner eye opposite the pupil. Using stereo photographs, 3D-images that is, image-based grading became the gold standard for developing treatments, for many, many years. Large datasets like EyePACS, which store images graded by retinal specialists, became available for product development.
So, when people began looking for AI applications, diabetic retinopathy happened to be at the shallow end of the (deep) pool.
“The data was clean, and grading of the disease was universal,” says Lily Peng, a product manager with Google’s deep learning AI research team, Google Brain.
By just looking at images of the fundus one can detect the exact stage of the disease. In addition, training and control data was in abundance. And the potential to impact the world was, well, eye-popping, since at least a fifth of all diabetics need careful follow-up for this disease. Yet the ball was lobbed into the ophthalmologists’ court because few diabetologists or general physicians were willing to invest in expensive retinal cameras.
“At Aravind, we developed our software when there was no AI or machine learning; we were looking for something which could lead to conversions [from screening to medical intervention],” says Dr Kim.
When Google came knocking in India, doctors were ready. In eye care, the standard of care is good. Thanks to visionaries like Dr SS Badrinath (founder of Sankara Nethralaya), Dr GN Rao (founder of LV Prasad Eye Institute) and Dr G Venkataswamy (Aravind Eye Care System), there is nothing that is not done in India and for which you have to go abroad, says Dr Raman. Dr Raman initially saw the project with Google as “yet another machine learning effort”. But a year later, when he saw that the algorithm had detected gender in 90% of cases and the disease in more than 95% of cases, he was astounded. “As a doctor, I can’t figure out the gender looking at the images. That’s what deep learning does; it finds associations which clinicians can’t.”
Is diabetic retinopathy, then, the low hanging prize which got taken, with other retinal disorders harder to crack?
“Not necessarily,” says Peng. “If you look at diabetic retinopathy, one of the biggest reasons AI is successful there is because it requires universal screening. In glaucoma, there’ll be [similar] AI offerings.” Peng, who co-founded a medical device startup before coming to Google, believes their neural net framework can be retrained for different diseases. “The model is optimised for one task now, but it can be optimised for different tasks.”
That’s a big draw. One of the reasons Trikha, a medical doctor for 14 years, set up Visulytix is because scalable models are not being built in healthcare. In just about two years, close to 200 million people will have macular degeneration, 80 million will suffer from glaucoma, 125 million will have diabetes-related eye problems, he reels off statistics. Whenever doctors are doing a high volume, repetitive task, there’s a risk that the quality of eye care could suffer. “We, as humans, are fallible. By using AI, we reduce the bias and clinical variability.”
Initially, AI system learning is slow. Trikha was surprised by how long it took Visulytix’s algorithm to identify the left eye from the right eye. “As humans, we don’t need to look at a thousand images to figure that out. But now, I am surprised the other way—how quickly AI deals with complexity. It can pick up subtle signs and uncover hidden relationships, for example, the gender of a patient or glycaemic control, which a clinician looking at the retina would never know.”
The eye has always been seen as a potential source of biomarkers for systemic conditions. Biomarkers are quantitative indicators that can be extracted from retinal images with minimal or zero user intervention and are strongly indicative of diseases. It’s another matter altogether that biomarker discovery requires a large collection of cross-linked data.
Window to the body
In the retina, says Solanki of Eyenuk, you can see the central nervous system, its smallest blood vessels. “And once you see anything going wrong in there, you can see things going wrong in the heart, kidney, foot…you go beyond the vision loss.” Solanki’s Eye-Art was one of the first deep learning based AI eye products when Eyenuk unveiled it during EASDec conference in Europe in 2015, with over 91% sensitivity and specificity, and 98.5% sensitivity for vision-threatening conditions.
In March, Peng and team at Google published a paper showing they could predict cardiovascular risk factors by looking at photographs of the retinal fundus. The algorithm could tell, fairly reliably, the age, gender, BMI, blood sugar level, and smoking status. Elsewhere, researchers are discovering personality types from retinal images.
Heaven only knows what algorithms will be able to detect in the future.
Do you worry about where we are headed? I asked Peng.
“At the end of it, these are important parameters, and there’s a large amount of data that is processed to get there. But by themselves they are not useful or interesting for us,” she says. As a trained physician-scientist (MD-PhD as they are called), Peng has her perspective on the information overload and wants to have as little to do with it as possible. “As clinicians, we train for years and years and deal with so much information. Everything is not useful… Now, we have to be careful how we use that information.”
Is that why deep learning is still some years away from making a meaningful intervention in disorders other than that of the eye?
In cardiology, or even neurology, doctors make big decisions through years and years of experience, says Trikha. If there are cancer lesions in a chest CT, how will you validate an algorithm against a radiology test? These are not easy decisions, AI will be supervised by a radiologist, he believes. “From a payer point of view, cardiology, cancer and others are difficult to penetrate. In the real world, all AI systems will have to provide compelling clinical and economic evidence about the impact and money saved. Furthermore, there will be a cultural barrier to overcome in many specialities.”
Heaven only knows what algorithms will be able to detect in future
Take a retinal selfie. Upload the image on Google Play (or any other app store). Go see a doctor if the app classifies you as ‘referable’.
Here’s a scenario that collapses the distinction between humans and machines, and could easily be a reality in the not too distant a future. However, a few things must fall into place before such tech-enabled population screening comes to pass.
The hardware is still a question.
“Though the prices have fallen [from Rs 11-20 lakh ($16,700-$30,000) to Rs 3-4 lakh ($4500-$6,000], it needs to fall further for the diagnosis to move from an ophthalmologist to a physician,” says Dr Raman. The costliest machine a general physician today uses is an ECG which costs Rs 20,000 ($291). The ideal retinal camera, he believes, should fall in this price range and be a smartphone accessory.
Retinal cameras have escaped true disruption, until recently. Their design and optics have largely remained the same for nearly 40 years, says Anand Sivaraman, founder of Remidio, a medical device company in Bengaluru that has built an iPhone-based retinal camera. He doesn’t think retinal camera prices would come down to the price level of an ECG, but he agrees, due to advancements in AI, there’s certainly a market for converting existing ophthalmologists (~20,000) in India to retinal specialists (currently at ~1,500). And then democratise it further to diabetologists, neurologists (retinal images are helpful in stroke management), and others.
“To move the camera to these new contexts, they have to be designed from the ground up,” says Sivaraman. “The beauty is, someone at Aravind Eye Hospitals in a village has the same need as the nurse practitioner in the US who has never taken a retinal image.” Sivaraman has a few patents around new optical designs for his portable camera, called Remidio Fundus on Phone.
If you get past hardware, biases stand tall. They get induced in AI algorithms in a variety of ways. One way to reduce it is to standardise the way data is collected. For instance, how is the image taken? What is the orientation of the eye? The hardware itself introduces biases. While the eye orientation may be the same, images can be big or small. Today the camera field varies from 20 degrees to 200 degrees. Though the standard is 45 degrees, says Sivaraman, newer cameras are going as wide as 200 degrees. (With wider field images, more lesions in the eye are visible.)
To avoid biases, says Trikha, introduce as much heterogeneity in the dataset as possible. Today, for some research or commercial groups, datasets often come from one institution. So that data will be specific to that institution, and the AI tool will work well in that institutional setting.
Looking at some of the global AI studies, Solanki claims there are statistical biases in them. “What is the gold standard? You are not going to go to 20-30 ophthalmologists, give them 30 seconds to decide and then take a majority vote for your diagnosis,” he quips. Gold standards should be set in clinical settings. “Datasets by itself are not sufficient. The standards also need to be set in algorithm development.”
As a company, Solanki’s Eyenuk has been building its AI-based diabetic retinopathy screening product much before the field became popular. Solanki believes many algorithms under development today do not have independent validations. Some may even have been reverse engineered—where developers have access to all the images. But that’s not the right way to test. “You can write your algorithm for the best performance, but in the real world, they may perform very differently.”
Solanki speaks from a vantage point. He’s been working with the UK’s National Health Services for a few years, and his Eye-Art has undergone 100,000 consecutive tests across 28,000 patients. In the UK, three algorithms were tested on the same dataset and Eye-Art was the only one not impacted by the type of cameras, race, age, and so on, he claims. The product is undergoing a prospective trial in the US.
Most of the AI tools today are for diabetic retinopathy. Since January, Aravind Eye has begun using the tool it developed with Google as a decision support system. We are going to expand it to larger groups, but my concern will be what if a person has other conditions in the retina (apart from diabetic retinopathy), then the algorithm will miss that, says Dr Kim.
Perhaps for that, there are platforms like Pegasus from Trikha’s Visulytix. It screens for five eye diseases and is currently awaiting European regulatory approval.
So, between screening for five diseases and one, what is the trade-off?
“There is no trade-off in performance. There is only one trade-off—time. Algorithm reading for one condition versus it reading for five conditions,” says Trikha.
Finally, after high hardware cost and biases, there’s this inscrutability of deep learning.
Looking into the darkness
AI systems have come under flak for being a black box. Even those who build it, with layers of deep learning algorithm, rarely understand how it works. Which is why the European Union recently proposed to introduce “a right to explanation” which lets citizens demand transparency for algorithmic decisions. Another matter though, that the legislators haven’t defined transparency yet.
Solanki says his company is opening up this black box. “By pinpointing exact areas in the images that are triggering the network, we show that networks are being triggered due to actual lesions and not because of any part of the retina. That is one part. On the other hand, we are showing that the algorithm really works well with hundreds of thousands of patients. Together these should give clinicians [and regulators] sufficient confidence in using this,” says Kaushal.
At Google, Peng says explainability is built into their project that seeks to predict cardiovascular risk from images of the retina. “Our model shows what parts of the image most contributed to the prediction. At ~9:45 in this video, I talk about the particular way we’ve applied this to retinal fundus images,” she says.
Trikha, too, says his screening platform provides for a lot of granularity so that the user can use it, question it, override it, learn from it, and teach people using it. “We believe transparency will allow trust.” Pegasus is awaiting approval in EU; the US FDA approval is some time away.
In healthcare, a few outside of the industry realise, regulatory approval is a big deal. It’s not easy to validate such technologies in clinical trials.
What if Google makes its algorithms open-source?
That does not change a thing, believe some. Clinics are not going to download an open source software and start using it. It’s free versus well-developed and maintained software. “As an extreme example, there is free water, but bottled water sells very well,” says Solanki.
In software, we always had Linux or Unix for free, but Microsoft was very successful with its operating system. In healthcare, the hiccup is legal and regulatory. Who is going to spend the time and money to do regulatory approval? And if they do, why will they give it away for free? If you read open source licenses, they come with extensive liabilities. All that said, even Microsoft is championing open source now. (In early June, it acquired GitHub, the open-source code repository, for $7.5 billion.) Google is working with camera maker Nikon and its sister company Verily Life Sciences to develop hardware and software products in eye care.
It’s very unlikely that AI solutions ever be like building the flavour-of-the-month smartphone app. Nor will they be like content, given for free with money made off advertising.
“You give people these amazing tools, and you know some business will grow out of these amazing tools,” says Peng, when asked how much of Google’s AI will be proprietary, and how much of it will be given away free. “You can look at us as the sauce company which makes the sauce which goes well with pizzas and burgers, but we don’t want to make all the pizzas [or burgers] in the world.”
Over time, monetisation models will emerge. The learnings and tools are already spilling over into cardiology and dermatology. From screening, AI will move into predictability—predict how cataract will develop or how, post-laser surgery, the refractive error of the eye will change for a person.
Da Vinci said, “The human being, creature of eyes, needs the image”.
Now, the intelligent machine, creature of data, needs the image too. Human eyes provide pretty good ones.