Like chess, Big Data is a combination of science, art and play; Gregory Piatetsky-Shapiro of KDnuggets helps data devotees discover winning moves.
For timely information concerning developments in data science, data mining and business analytics, KDnuggets is widely regarded as a leading outlet in the field. Created in 1993 by founder, editor and president Gregory Piatetsky-Shapiro, it is frequently cited as one of the top sources of data science news and influence by various industry watchers.
We talked with Dr. Piatetsky-Shapiro (a noted data scientist and chess enthusiast) to obtain his insights on how data science may evolve, the impact of automation, how organizations can leverage data to their benefit, and the future role of humans in predictive management.
“Data science can be regarded as automation of data activities, and data science itself is also being automated – a kind of automation of automation. We did a poll of KDnuggets readers a couple of years ago and asked the question, ‘When do you expect most data science activities will get automated?’ and the median answer was around 2025.”
– Gregory Piatetsky-Shapiro, KDnuggets
ANSWERS: What are some use cases of data science that you find to be particularly valuable to organizations in this age of Big Data?
GREGORY PIATETSKY-SHAPIRO: Where people typically apply data science, probably not surprisingly, are in the areas of customer relationship management (CRM) and consumer analytics. Data science allows you to predict consumer behavior better and usually make incremental improvements and predictions, but those incremental improvements could translate to significant revenue. In the last couple of years, thanks to revolutionary advances in deep learning we see amazing advances in new areas connected to image and speech understanding, such as radiology. It was reported recently that deep learning systems have exceeded the accuracy of human radiologists in diagnosing cancer. Generally speaking, in areas where there is a very large amount of labeled data, deep learning has already been achieving human or superhuman levels.
We are also seeing advances in applications in cybersecurity and speech recognition, particularly with smartphones – now, when we talk with smartphones, they frequently understand us better than people on the telephone. Smart speakers have made their way into roughly a quarter of all homes in the United States, and they have an increasingly more accurate understanding of speech. Machine translations have become amazingly better in many areas. So these applications of data science and machine learning are growing at a very fast rate. Basically in any area where you have a lot of data, you can benefit from the use of data science and machine learning.
ANSWERS: If we become more heavily reliant on AIs to perform predictive behaviors, does that leave a role for humans in terms of predictive management?
PIATETSKY-SHAPIRO: I see different developments in the near term and the long term. In the near term, I can see people performing together with AI; one example would be when radiologists review and approve the results of medical tests with an AI. ButI’d like to use chess as an illustration.
I’m a chess player and follow the game closely. In 1997, IBM’s Deep Blue program (developed after several years of effort) defeated then-World Champion chess player Garry Kasparov. After that, people organized human-computer teams, and there were tournaments where human-computer teams played versus unassisted humans and versus computers. For some period of time afterwards, the human-computer teams were better than either humans or AIs alone.
However, AI programs improve in chess is much faster than humans. The human-computer teams were soon inferior to pure computer teams – there was no advantage in adding a human grandmaster to a computer.
In 1997, it took IBM several years to develop algorithms and special software to defeat Kasparov. Last year, Google DeepMind developed a program called Alpha Zero (so called because it started learning gameplay in chess and Go and another game using zero human knowledge). It just played games with itself and used additional methods called reinforcement learning to improve. This program took only four hours of self-play to reach and exceed the world champion level in chess. The world champion level now means not a human but a computer program. It took four hours for chess; and for Go (which is a more difficult game and very popular in Asia), it took about three days.
Games like Go and chess are easier to master then what happens in the real world because they have well-defined rules and limited, finite boards but we see similar developments in other areas; I already mentioned that AIs have exceeded medical doctors in radiology. In other domains, it probably will take longer but if there is sufficient data, then an AI can learn to perform at a superhuman level.
If there is insufficient data then there are methods like reinforcement learning that allow agents to actively experiment in the world and learn from their own experiments; they’re sort of behaving like children. I have a one-year old granddaughter, and I enjoy watching how she explores everything around her. AI reinforcement learning behaves in this way, except it can learn much faster than children and can transfer its learning – to learn something in one domain and apply that knowledge to other domains.
I can see a role for humans in managing AIs in the short term but in the long term it’ll go towards full automation, because humans will not be able to perform at the same level as an AI. As an example, Google had developed a self-driving car that had a steering wheel. The idea was that the human backup driver would takeover at some point if there was a problem. However, Google saw in testing that the human in the car could not react fast enough in case of an emergency. As a result, Google removed the steering wheels in those self-driving cars and replaced them with big “stop” buttons.
Even asking a human to push a stop button in emergency may not always work. We saw a tragic confirmation of this recently when a self-driving Uber car ran over a pedestrian with a bicycle. The car was confused by the bicycle, and there was not enough time for the human to take over.
I think that’s essentially what will happen long term with the human role in predictive management, that they need a large “stop” button if they don’t understand something and even the stop button may not be enough if AI system is sufficiently autonomous. Humans will not have the ability to manage it long term. Of course, what is “long term” may differ – for some, it may be two years from now, for others it may be 50 or 100 years – and there may be variable roles for humans to play in the meantime.
ANSWERS: How do you see the impact of automation developing, and what do you see its impact being on data science?
PIATETSKY-SHAPIRO: Data science can be regarded as automation of data activities, and data science itself is also being automated – a kind of automation of automation. We did a poll of KDnuggets readers a couple of years ago and asked the question, “When do you expect most data science activities will get automated?” and the median answer was around 2025.
Currently, I think there is a golden age of data science with great tools and high demand for data scientists. Around 2012 “Data scientist” has been proclaimed the sexiest profession of the 21st century. But data science is data-driven and has many well-defined rules which make it also subject to automation.
There are now companies like DataRobot, H2O.ai and others that are working on automating data science – you just plug in the data and the computer gives you the needed results or predictive models.
I believe that data scientists with lower expertise are more likely to have their roles automated. I think data scientists whose roles are less likely to be automated or take a much longer time to be automated will be those who help define new paradigms and help ask better questions.
ANSWERS: How do you see data science and digital identity evolving in the near future?
PIATETSKY-SHAPIRO: We leave more and more digital footprints in today’s world. The younger generation is very active on social media. Even if people are not using Facebook or other social media platforms, they can be analyzed if their friends upload a photograph of them. People overall are spending more time in the digital world.
I see this trend only increasing, as there is more and more information about us thanks to all the smartphones and devices that we use in our daily lives. The companies that are most able to leverage this trend are those that have already been very effective thus far, such as Facebook, Google and Apple.
Let me say, however, that data science is not only for personal behavior. We should remember that data science can be applied to other areas, not necessarily just for predicting human behavior. For instance, it could be applied to things like radiology, medical diagnostics and predicting the behavior of drugs. It is not limited to human behavior.
Human behavior is a separate phenomenon because it’s the most interesting to us and that’s where we have the most complicated ethical and legal issues. Not surprisingly, with more data about human behavior, computers are becoming better at predicting human behavior.
A few years ago there was a lot of noise in the media about a story that a large retailer was sending advertising to a particular teenage girl for pregnancy products. Supposedly the girl’s father called the retailer and asked why it was sending baby products and advertisements to their teenage daughter, who was not pregnant. Sometime later when the retailer called back, the father apologized, saying at the time they didn’t know but came to discover their daughter was in fact pregnant. It turned out that the retailer’s learning system was able to predict that that particular girl was actually pregnant based on her purchases.
What this shows is that large companies can learn what pregnant women are buying before their babies are born. Based on that, and because there are many examples, they can build a model that will predict that when a woman starts to buy a particular set of products, she is pregnant. This approach could be applied to many other areas.
I see this trend only growing as new, more powerful methods are depending on machine learning to be able to do this with more accuracy. What happens if consumers don’t like it? There are new privacy regulations in Europe, like GDPR, that give people some rights to control their data and how it is used, but it’s not yet clear how this will all play out. What is clear is that when there is more data about people available, other companies can use that data to predict behavior with more accuracy.
In our new series, AI Experts, we interview thought leaders from a variety of disciplines — including technology executives, academics, robotics experts and policymakers — on what we might expect as the days race forward towards our AI tomorrow.