But while machine learning technology is being implemented across a slew of markets, a famous maxim comes to mind: during a gold rush, sell shovels. Namely, as machine learning becomes ubiquitous, an acute need for supporting infrastructure presents itself. As is often the case, Israel’s startup ecosystem has been responsible for a disproportionate amount of new companies in the space. Below is a market landscape of Israeli startups operating in the machine learning infrastructure market, and some accompanying thoughts.
The number of companies that are included above may come as a surprise – it definitely did to me. Upon closer examination however, it becomes clear why so many startups are attempting to serve this industry. While the machine learning revolution seeks to unlock incredible value for businesses and consumers, there are four critical issues preventing machine learning technology from reaching maturity; and a number of Israeli start-ups are working day and night to solve them.
Compute Intensive: Hardware + Software
The majority of machine learning tasks, specifically those involving deep learning (a subset of machine learning), are incredibly compute intensive. Practically this means that traditional computer chipsets (CPUs) are simply not powerful enough to support a machine learning operation at scale. As such, developers have searched for alternatives.
The first shovel maker that benefited from this was Nvidia who realized that its GPUs were well equipped to deal with computationally intensive machine learning tasks. On the heels of Nvidia’s success as well as the identification of the opportunity, several other players went to the lab to produce their own AI dedicated chipsets. Google recently announced its third generation TPU, Microsoft revealed its ambitious project brainwave, Alibaba joined the race with its Ali-NPU, Facebook and Amazon are both rumored to be working on their own AI chips, and let’s not forget about the semiconductor pioneers like Intel, who recently announced its Nervana Neural Net, AMD and ARM who have their own projects as well.
Beyond these behemoths, there are a bevy of startups who have raised significant rounds of financing in an effort to bring their customized hardware to market, including Israeli startups, Habana and Hailo who each raised a large series A over the past year.
With that in mind, can new entrants compete in this market? Judging by the necessary capital required to get a first chip to market and the lack of potential to a first mover advantage in 2018, it seems like an uphill battle.
There exists a parallel line of thought however, that the compute intensive issue need not be solved by creating new hardware but rather through the development of software that learns how to best optimize existing hardware for machine learning tasks. Here too, the tech giants have made their mark, with Facebook releasing a popular research paper on the subject last year and Uber’s open sourcing of Horovod. Also notable is Google’s AutoML which remains a relative mystery in terms of its abilities and capabilities. That said, we expect to see much more algorithmic innovation in the coming months and years addressing this issue and believe that there is an opportunity for startups to strike gold.
Experiments, Experiments, Experiments
Another issue that most machine learning engineers are faced with surrounds the devops work associated with machine learning tasks. A key step in the workflow of data scientists is the running of hundreds, if not thousands, of experiments using different parameters and hyper-parameters in order to reach the optimal result. While referred to as data science this is often times more of an art, or at the very least a combination of an art and an intensely empirical science, which requires a lot of trial and error to verify hypotheses. Furthermore, most data scientists do not have strong backgrounds in devops practices, and thus do not work in a scalable fashion. Israeli startups such as Missinglink, Allegro, cnvrg.io and Comet have each developed interesting products that aim to solve this issue.
Aside from startups that address this issue, making the lives of data scientists easier has drawn the attention of several tech giants. While not falling under the umbrella of devops directly, Google (Tensorflow), Facebook (PyTorch, Caffe2), Microsoft (CNTK) and Amazon (Mxnet, indirectly) have each made their mark on advancements in deep learning through the open sourcing of deep learning frameworks — critical software necessary for efficient development of deep learning. As can be seen from the image below, Tensorflow is leading the way.
Interestingly, Amazon, Microsoft, Intel, Facebook, Nvidia, Baidu, Huawei and others have joined forces to support ONNX, seemingly in an effort to unseat Google as the de facto market leader.
These companies, especially the cloud providers, have a direct incentive to provide supporting tools for the machine learning community, as many view machine learning to be one of the key forces of growth for cloud usage. And while competition from large technology incumbents is a point of concern in virtually every industry, it is particularly concerning in this industry as machine learning is so central to their future. As such, it’s necessary to ask if a community that is used to using open source tools will begin paying for software? Sagemaker may be the first indication that we are moving in that direction, although this too might have troubling implications for aspiring startups.
It is our belief that while the next few years will indeed see a combination of open source and paid software tools being used, certain areas will be dominated and commoditized by the cloud providers. Namely, issues relating to development environments and devops, as this is the first barrier to entry for most data scientists and machine learning engineers.
Can You Tag That For Me?
Related to the devops issues mentioned above, another key step in the machine learning workflow is the annotation, or tagging, of data. Most data that is collected, whether it be images or text, needs to be correctly labeled in order for the model to learn effectively. This is particularly critical for a specific type of machine learning called supervised learning, which is the most applicable type of machine learning as of today (while a full explanation of supervised learning and other learning methods of machine learning is beyond the scope of this post, this microsite published by Andreesen Horowtiz gives a great overview of the subject). The problem though, is that companies use hundreds of thousands or even millions of data points to train their models, meaning that data annotation is often a bottleneck in the machine learning workflow. The most popular methods for solving this problem today is finding cheap labor (i.e. Mechanical Turk), however due to the necessity of having a human in the loop, this seems to be an area that is ripe for some automation. Multiple startups have been founded to meet this need (see Dataloop and DataGen here in Israel)and provide a more automated and scalable approach to both tagging data and generating proprietary, pre-tagged, data. But they are not alone. One of the focal points of Google’s aforementioned AutoML offering, includes a data labelling service. It will be interesting to see how this problem is addressed in the coming years.
An intriguing attempt to bypass this issue is the rise of unsupervised learning, which does not require labeled data. Instead, unsupervised learning takes the input set and tries to find patterns in the data, for instance by organizing them into groups (clustering) or finding outliers (anomaly detection). One of the most exciting examples of unsupervised learning, is an open-source attempt to duplicate AlphaChessZero, run by the community. And within unsupervised learning comes one of the most fascinating developments in AI, called Generative Adversarial Networks (GANs). In essence, two networks battle each other where one network, called the generator, is tasked with creating data to trick the other network called the discriminator. For a more detailed explanation of GANs, see this post.
Data Scientists Wanted
Lastly, due to the relatively quick rise of machine learning, there are simply not enough trained data scientists to meet the need of companies looking to incorporate machine learning into their business. This has created an expensive arms race between startups, established companies and tech giants. According to Glassdoor, the average base salary for data scientists is $120,000 per year. Translation — hiring a team of data scientists is simply too expensive for young startups, and impractical for companies in which machine learning doesn’t represent a core focus. But as was pointed out in the opening of this post, machine learning can be beneficial for a wide audience. A local grocery chain could use machine learning to predict consumer demand in order to have the right items in stock and a gym could use machine learning to forecast customer churn and make better use of its marketing budget.
This has led to the rise of another group of companies that provide machine learning services on demand. Some of the more established companies in this domain include Palantir and Data Robot (in Israel SparkBeyond, Razor Labs and Pita are very intriguing). But these are high end services for companies with large budgets, and so there remains an interesting opportunity for companies to provide machine learning services at affordable costs. There are however, several questions about the potential success of such companies. First and foremost, data privacy and security is of utmost importance, and many companies are reluctant to trust a third party to handle such important assets. Secondly, the compute costs of such companies are sure to be quite high, presumably causing the expensive rates of Data Robot and others. And lastly, these companies will need to solve for themselves all of the other issues mentioned above: training optimization, devops and data annotation, in order to offer these services effectively.
A recent article by OpenAI shows that there has been a 300,000x increase in the amount of compute used in the largest AI training runs since 2012, suggesting a 3.5 month doubling time. As this trend continues, we expect to see more and more startups trying to solve the plethora of infrastructure issues that exist in developing machine learning applications. Training optimization is a specific area of interest for us, as the ability to train new models as well as re-train existing models with newly acquired data will likely become the technological standard for data driven companies, especially as this can lead to data driven network effects.
As this gold rush continues, we’re looking to partner with world-class teams building solutions to the pressing issues outlined above. Feel free to contact me @ Yonatan[at]tlv.partners to learn more about our view on this topic.
Thanks to Rona Segev, Shahar Tzafrir, Omri Geller, Ronen Dar, Eran Shlomo, Omer Spillinger, Idan Bassuk and Max Marine for their invaluable feedback while writing this piece