Generative AI: The Macro Opportunity in Harnessing AI and Automation
#1 While GenAI adoption is now on the fast track, widespread adoption will take time. For perspective, it took six years for 50% of US households to adopt mobile internet; we expect a comparable level of widespread AI adoption to take 3+ years. This is consistent with the halving time scales of technology adoption we’ve seen with the PC era, the desktop era, and the mobile internet era.
- The drivers of adoption hinge on three main factors: ROI (productivity growth versus cost), friction reduction, and budget alignment. Encouragingly, ROI and budget allocations show positive trends. Reducing friction requires developing robust data organization and privacy standards, which could limit the pace of adoption. On the other hand, digitally native employees have already begun readily embracing Generative AI and are enhancing workforce productivity.
#2 The swift uptake of GenAI introduces new cybersecurity challenges. For instance, the introduction of GenAI systems like ChatGPT amplifies the risk of phishing attacks. A shortage of cybersecurity talent compounds this challenge, elevating it to a critical concern for enterprises.
- The macro-opportunity in harnessing AI and automation, despite gradual adoption, is undeniable. In terms of opportunity size, GenAI’s cybersecurity potential represents a $34 billion Total Addressable Market (TAM), with productivity gains acting as a driving force. It is important for organizations to proactively address the implications and maintain a strong focus on AI cybersecurity.
Securing the Future: Demystifying LLMs and Threats
#3 There are three broad areas of LLM threats worth noting: prompt injection, data poisoning, and data leakage (not from the LLMs but from agents and vector databases).
- Prompt injection can be compared to confusing the model, e.g., instructing it to behave as someone else to access information it wouldn’t provide otherwise. This tactic is not new in cybersecurity. The reason it works lies in the machine’s inability to distinguish between the control plane and the data plane. In simpler terms, it can’t differentiate between system code and user input. Prompt injection can occur through various means, including images or text, and it can prompt actions by agents, making it a more potent threat (e.g., a bad actor can inject an action into an email to “delete all” of an inbox). Potential risks include damage to an entity’s brand, data losses and financial losses.
- Data poisoning involves intentionally manipulating data to undermine a model’s behavior. This can occur in different forms, depending on where the intrusion takes place within the tech stack. The two primary forms are:
- Input Poisoning (most common): Adversaries alter trusted data sources used by LLMs to skew their learning (e.g., Wikipedia, or expired domains).
- Model Editing: This form of data poisoning involves modifying LLMs to spread misinformation. For example, adversaries might tweak facts within an LLM and upload the altered model to a public repository like Hugging Face. From there, LLM builders integrate these manipulated models into their solutions, thus spreading false information to end users.
#4 Looking ahead, we anticipate that data poisoning will evolve to become more advanced, and possibly the hardest area to address with modern cybersecurity.
As LLMs increasingly incorporate diverse data sources, data poisoning becomes more sophisticated and practical, thereby expanding the potential attack surface. This evolution is fueled by an expanding array of data sources, shorter training cycles, and a shift towards smaller, specialized models. As a result, data poisoning is likely to become an increasingly prominent threat.
- Data leakage – Enterprises are understandably concerned about employees sending sensitive data to LLMs, as malicious actors could exploit this information. However, it’s important to recognize that LLMs are not data stores; they’re data generators so this threat is a bit over hyped. Extracting data from LLMs requires several prerequisites:
- Sufficient References: To extract data, there must be a substantial number of references to it, enabling memorization.
- Knowledge of the Secret: Adversaries need to possess enough knowledge about the secret or its format to generate it accurately (e.g., first 4 digits of SSN).
- Verification of Accuracy: The attacker must verify that the response is accurate and not a hallucination.
However, data leakage emerges as a deeper concern when data is extracted not from the LLM itself, but from agents or vector databases. This highlights the critical importance of access control rules to safeguard information security.
Unveiling Generative AI: Navigating Risks and Opportunities
#5 Prioritize use cases to reinforce trust in LLM deployments from the very beginning.
- A staggering number of opportunities exist across industries and enterprise functions to leverage GenAI, but use cases need to be prioritized based on expected productivity gains and anticipated risks. One approach is to prioritize use cases with lower risk profiles. This can be facilitated through a framework that considers the criticality of the desired outcome and the level of accuracy required for the task. Many internal use cases may fall into the “low-low” risk category, such as coding assistance. For riskier or externally focused cases, like underwriting in financial services, human oversight becomes essential. However, it’s vital not to discount potential high impact but higher risk initiatives and give them proper consideration.
#6 Ensure that the right guardrails are in place to navigate the risks associated with deploying GenAI technology.
- This includes compliance with regulations like GDPR, the California Consumer Privacy Act, and the EU AI Act, amongst others, and being thoughtful about how enterprises handle data and inform users about its use. Adopting the NIST AI Risk Management Framework can also play an important role if enterprises do not already have a robust framework in place. If an enterprise decides to build a private LLM, it becomes a team’s responsibility to embed ethics and moral principles that align with the organization’s values.
#7 Engage cross-functional teams on GenAI deployment because this is a team sport that will often require that different stakeholders like the CTO, CIO, CISO, product managers, HR, and legal are all involved.
- The CEO is a starting point whose buy-in of the potential gains and understanding of GenAI technology is crucial. But getting to scale and efficiency requires broad collaboration and communication. This is often more important amongst those executing the vision than those seeding it.
Armor for AI: Implementing AI Security While Enabling AI Quality
#8 The cybersecurity talent gap is wide and continues to widen. As the barrier to deploying AI applications lowers, the difficulty in properly securing these systems grows disproportionately.
- Many organizations lack the necessary resources and data to train their own models, making external sources an attractive option. Outsourcing AI models, however, often results in reduced control and limited understanding of how they function. Consequently, many organizations are adopting a “build and buy” strategy, which is traditional in cybersecurity, as opposed to the “build vs. buy” approach.
#9 There is rapidly growing demand for anyone selling solutions. In the startup space, AI Security (AISec) has been one of the strongest categories for fundraising this year.
- According to the Ethical AI Database (EAIDB), 75% of active AISec startups raised funding in 2023.
#10 Innovation in AI does not discriminate between defender and attacker. As GenAI models become stronger, they also become more dangerous tools for malicious intent.
- “ML to attack ML” (which has been around for a long time now) has already evolved into “AI to attack AI.” Whose responsibility is it to ensure that new foundational models, open-source models, etc. are not used maliciously?
Table Stakes: Exploring Guardrails for Large Language Models
#11 There is simply no way to prevent bad actors from using Gen AI for advanced phishing attacks, this is in their DNA and motivation.
#12 Having a clear LLM Data Privacy policy in place and an advanced AI roadmap with corresponding safety measures is essential.
- Enterprises can start with limited access controls, small private models, and involve humans in the loop to ensure protection against risks. However, merely formulating a policy isn’t enough; collaboration, governance and a cultural shift are required to fully implement and enforce the right measures for success.
#13 To effectively establish data privacy guardrails for LLMs, it’s crucial to approach it as a process rather than a static product.
- If an enterprise is building a private model, you can implement various guardrails at different stages, including training, fine-tuning, and prompt crafting. Numerous existing technologies can assist in this process. For instance, sensitive data can be replaced during the training stage.
#14 While defining guardrail policies for LLMs constitutes a strong first step, the point of failure often lands at enforcement. Several best practices are emerging to ensure data privacy. These include output scanning techniques such as anonymization, data minimization, differential privacy, encryption and federated learning.
The Hierarchy of Unknowns and What Do We Do Now
#15 GenAI is a technology breakthrough beyond precedent; for the first time, humans will need to reckon with the eventual existence of super-intelligence.
- This brings a hierarchy of unknowns, of questions that span all facets of human life as we know it, from the technology implications to corporate, social, legislative, political, national, environmental, philosophical and existential factors.
- As we consider the many productivity and efficiency gains from GenAI, how can humankind prevent its weaponization and institute the necessary protections to preserve humanity and human-in-the-loop controls? We are at the beginning of asking and addressing these fundamental questions.
Communities: Embracing the complexities of an AI driven Future and charting a course that is Ethical
#16 The impact of communities is accelerating:
- EAIGG– an AI-practitioner community with 1,800+ global members- released its 2023 Annual Report at the event. In this ever-evolving landscape, opinions on the opportunities and risks presented by AI are as diverse as they are passionate. It’s easy to get swayed by the cacophony of voices, each asserting its version of the truth. EAIGG has made a conscious effort not to lean too heavily in one direction. Instead, this annual report presents a collection of thought-provoking articles that aims to elevate the discourse, offering insights that can illuminate the path forward and foster a platform for meaningful dialogue.
- The Ethical AI Database– developed in partnership with the EAIGG- remains the only publicly available, vetted database of AI startups providing ethical services. The database is updated live, with market maps and reports published semiannually.
- The Markkula Center of Applied Ethics at Santa Clara University promoted its framework for ethical decision-making enterprise Boards and C level executives at the event.
Conclusion
Enterprises have begun to recognize the productivity opportunities associated with GenAI, but they must be ready to innovate without being paralyzed by fear because this is where the future is headed. Technology breakthroughs have always produced equal levels of optimism and pessimism, excitement and anxiety. A few frameworks and strategies can help enterprises navigate the path. Prioritizing the right use cases and putting guardrails in place early are crucial steps. Guardrails include protecting against threats such as prompt injection, data poisoning, data leakage as well as ensuring compliance with regulations while engaging cross-functional teams on deployment to achieve the benefits of scale and efficiency. Innovation from AISec startups will play a strong role to secure AI as enterprises may lack the resources to invest in this themselves and there are simply no ways to completely prevent bad actors from exploiting GenAI to launch advanced phishing attacks. Finally, while defining guardrail policies for LLMs represent a good first step, enforcement is often the Achilles heel, so leveraging emerging best practices around output scanning is of critical importance to ensuring data privacy, and a secure information environment for the enterprise. Perhaps the ultimate conclusion is that we are still in the early days of GenAI and we realize that more collective discussion and cross-industry collaboration are vital to ensure we are Securing AI while advancing innovation and growth.
Many thanks to our honored guests and speakers who shared their expertise and perspective with us:
- Andres Andreu, CISSP-ISSAP, QTE, 2U, Inc.
- Abhinav Raghunathan, Ethical AI Database
- Betsy Greytok, IBM
- Caleb Sima, Cloud Security Alliance
- Carolyn Crandall, Marticulate
- Hailey Buckingham, HiddenLayer
- Hamza Fodderwala, Morgan Stanley
- Ilya Katsov, Grid Dynamics
- Joe Levy, Sophos
- Katharina Koerner, Tech Diplomacy Network
- Patrick Trinkler, CYSEC SA
- Sandeep Mehta, The Hartford
- Shayak Sen, Truera
- Tobias Yergin, Major Global Retailer
- Tom Kelly, Markkula Center for Applied Ethics