Why the Deployment of AI Tells Us Little about the Technology
Background Paper No. 2
By Michael Depp
Michael Depp is Program Coordinator with the Cyberspace Cooperation Initiative at the Observer Research Foundation America.
Artificial intelligence (AI) has become a buzzword in technology in both civilian and military contexts. With interest comes a radical increase in extravagant promises, wild speculation, and over-the-top fantasies, coupled with funding to attempt to make them all possible. In spite of this fervor, AI technology must overcome several hurdles: it is costly, susceptible to data poisoning and bad design, difficult for humans to understand, and tailored for specific problems. No amount of money has eradicated these challenges, yet companies and governments have plunged headlong into developing and adopting AI wherever possible. This has bred a desire to determine who is “ahead” in the AI “race,” often by examining who is deploying or planning to deploy an AI system. But given the many problems AI faces as a technology its deployment is less of a clue about its quality and more of a snapshot of the culture and worldview of the deployer. Instead, measuring the AI race is best done by not looking at AI deployment but by taking a broader view of the underlying scientific capacity to produce it in the future.
AI Basics: The Minds We Create
AI is both a futuristic fantasy as well as an omnipresent aspect of modern life. Artificial intelligence is a wide term that broadly encompasses anything that simulates human intelligence. It ranges from the narrow AI already present in our day-to-day lives that focuses on one specific problem (chess playing programs, email spam filters, and Roombas) to the general artificial intelligence that is the subject of science fiction (Rachel from Blade Runner, R2-D2 in Star Wars, and HAL 9000 in 2001: A Space Odyssey). Even the narrow form that we currently have and continually improve, can have significant consequences for the world by compressing time scales for decisions, automating repetitive menial tasks, sorting through large masses of data, and optimizing human behavior. The dream of general artificial intelligence has been long deferred and is likely to remain elusive if not impossible, and most progress remains with narrow AI. As early as the 1950’s researchers were conceptualizing thinking machines and developed rudimentary versions of them that evolved into “simple” everyday programs, like computer opponents in video games.
Machine learning followed quickly, but underwent a renaissance in the early 21st century when it became the most common method of developing AI programs, to the extent that it has now become nearly synonymous with AI. Machine learning creates algorithms that allow computers to improve by consuming large amounts of data and using past “experience” to guide current and future actions. This can be done through supervised learning, where humans provide correct answers to teach the computer; unsupervised learning, where the machine is given unlabeled data to find its own patterns; and reinforced learning, where the program uses trial and error to solve problems and is rewarded or penalized based on its decision. Machine learning has produced many of the startling advances in AI over the last decade such as drastic improvements to facial recognition and self-driving cars, and has given birth to a method that seeks to use the lessons of biology to create systems that process data similar to brains: deep learning. This is characterized by artificial neural networks where data is broken down to be examined by “neurons” that individually handle a specific question (e.g. whether an object in a picture is red) and describes how confident it is in its assessment, and the network compiles these answers for a final assessment.
But despite the advances that AI has undergone since the machine learning renaissance and its nearly limitless theoretical applications, it remains opaque, fragile, and difficult to develop.
Challenges: The Human Element
The way that AI systems are developed naturally creates doubts about their ability to function in untested environments, namely the requirement of large amounts of data inputs, the necessity that they be nearly perfect, and the effects of the preconceived notions of its creators. First, lack of, or erroneous, data is one of the largest challenges, especially when relying on machine learning techniques. To teach a computer to recognize a bird, it must be fed thousands of pictures to “learn” a bird’s distinguishing features, which naturally limits use in fields with few examples. Additionally, if even a tiny portion of the data is incorrect (as little as 3%), the system may develop incorrect assumptions or suffer drastic decreases in performance. Finally, the system may also recreate assumptions and prejudices—racist, sexist, elitist, or otherwise—from extant data that already contains inherent biases, such as resume archives or police records. These could also be coded in as programmers inadvertently impart their own cognitive biases into the machine learning algorithms they design.
This propensity for deep-seated decision-making problems, which may only become evident well after development, will prove problematic to those that want to rely heavily on AI, especially concerning issues of national security. Because of the inherent danger of ceding critical functions to untested machines, plans to deploy AI programs should not be seen primarily as a reflection of their own quality, but of an organization’s culture, risk tolerance, and goals.
The acceptability of some degree of uncertainty also exacerbates the difficulties in integrating AI with human overseers. One option is a human-in-the-loop system where human overseers are integrated throughout the decision process. Another is human-on-the-loop system where the AI remains nearly autonomous with only minor human oversight. In other words, organizations must decide whether to give humans the ability to override a machine’s possibly better decision that they cannot understand. The alternative is to cede human oversight that may prevent disasters that might be obvious to organic minds. Naturally, the choice will depend on the stakes: militaries may be much more likely to allow a machine to control leave schedules without human guidance rather than anti-missile defenses.
Again, as with doubt about decision integrity, the manner in which an organization integrates AI into the decision-making process can tell us a great deal. Having a human-in-the-loop system signals that an organization would like to improve the efficiency of a system considered mostly acceptable as is. A human-on-the-loop system signals greater risk tolerance, but also betrays a desire to exert more effort to catch up to, or surpass, the state of the art in the field.
The Global AI Race: Measuring the Unmeasurable
Research and development funding is a key component of scientific advances in the modern world, and is often relied on as a metric to chart progress in AI. The connection is often specious, however; the scientific process is often filled with dead ends, ruined hypotheses, and specific research questions with no broader significance. This last point is particularly salient to artificial intelligence because of the tailored nature of specific AI applications, which requires a different design for each problem it tackles. AI that directs traffic, for example, is completely worthless at driving cars. For especially challenging questions (e.g. planning nuclear strategy), development is an open-ended financial commitment with no promise of results.
It becomes difficult, therefore, to accurately assess achievement by simply using the amount spent on a project as a proxy for progress. Perhaps money is being spent on dead ends, an incorrect hypothesis, or even to fool others into thinking that progress is being made. Instead, we should see money as a reflection of what the spender values. Project spending then is not an effective metric of the progress of AI development, but of how important a research question is to the one asking it.
But that importance provides a value for analysis, regardless of its inapplicability to measuring the AI race: the decision-making process can speak volumes about the deployer’s priorities, culture, risk tolerance, and vision. Ironically, the manner in which AI is deployed says far more about the political, economic, and social nature of the group deploying it than it does about technological capability or maturity. In that way, deployment plans offer useful information for others. This is particularly valid in examinations of government plans. Examination of plans have produced insight such as using Chinese AI documents to deduce where they see weakness in their own IT economy, finding that banks overstate the use of chatbots to appear convenient for their customers, or noting that European documents attempt to create a distinctive European approach to the development of AI in both style and substance. It is here that examinations of AI deployment plans offer their real value.
There are instead much better ways to measure progress in AI. While technology rapidly changes, traditional metrics of scientific capacity provide a more nuanced base to measure AI from and are harder to manipulate, which makes them more effective than measuring the outputs of AI projects. The most relevant include: scientists as a proportion of population, papers produced and number of citations, research and development spending generally (as opposed to the focus on specific projects), and number of universities and STEM students. Measuring any scientific process is naturally fraught with peril due to the potential for dead-end research, but taken broadly these metrics give a far better picture of the ability of a state or organization to innovate in AI technology. Multiple metrics should always be used however; any focus on a specific metric (e.g. research spending) will make it just as easy to game the system as relying on AI deployment does. Such a narrow focus also distorts the view of the AI landscape. Consider, for example, the intense insecurity over the position of the United States despite its continuing leadership in terms of talent, number of papers cited, and quality of universities.
Recharging the Scientific Base
The U.S. National Security Commission on AI draft report notes, “The nation with the most resilient and productive economic base will be best positioned to seize the mantle of world leadership.” This statement encapsulates the nature of the AI race, and naturally, measuring it. If a government or a company wishes to take a leadership position in the race, the goal should be to stimulate the base that will produce it, not actively promote a specific project, division, or objective. This involves tried and true (but oft neglected) policies like promoting STEM education, training new researchers internally, attracting foreign talent with incentives, providing funding for research and development (especially if it forms a baseline for future work such as computer security or resilience), and ensuring that researchers have access to the IT hardware that they need through adequate manufacturing and procurement processes.
These suggestions are often neglected in the United States in particular because of intense politicization of domestic priorities such as education policy (affecting universities), immigration policy (affecting the attraction of foreign talent), and economic policy (affecting manufacturing and procurement). At the same time, it is not only about providing more funding but streamlining processes that enable scientific capacity. For example, the system for receiving scientific research grants is byzantine, time-consuming, and stifling with different government agencies having overlapping funding responsibilities. Efforts should be made to ensure that applying for grants is not only easier, but that it promotes broader scientific inquiries. By solving problems like these, leaders invest in the components that will create the winning position in the AI race, and observers can determine who is making the strides to lead now, as well as in the future.
In the information age, the deployment of new technologies and their level of advancement have become key metrics in measuring power and effectiveness, but these are often flawed. Particularly for AI projects, research budgets, task assignments, and roles relative to humans demonstrate little about the state of the technology itself. Given the many fundamental problems with deploying AI, risk tolerance and strategic culture play much more of a role in determining how it is carried out: the more risk tolerant an organization is and the more it feels challenged by competitors, the more likely it will adopt AI for critical functions. Rather than examining AI deployment plans to see which country or organization is “ahead,” we should use them to study their worldview and strategic outlook. Instead, we should rely on overall scientific capacity to determine pole positions in the AI race.
Michael Depp is a program coordinator at the Observer Research Foundation America where he focuses on the future of technology. His research interests include the effects of emerging technology on international competition and the role that digital technology plays in military conflict. He is grateful to Lora Saalman and Andreas Kuehn for their comments on an earlier draft of this paper.
Image credit: https://claudeai.wiki