By Mark Katz, CTO of Financial Services, Hitachi Vantara
When ChatGPT took the world by storm, line-of-business (LOB) teams at banks wanted to use generative AI to power virtual assistants on their websites and mobile apps. Internal teams asked for AI-based recommendation engines to help identify the best bank credit cards and retirement products for individual customers, and to assist in portfolio optimization and trading strategy. And IT leaders sought to use GenAI to make their IT teams more operationally efficient.
But GenAI is not just another Windows application. It’s an entirely new thing. GenAI requires fast and scalable storage, compute, and networking; and IT teams are now on the hunt for that infrastructure. However, it’s not just about bits and bytes. There’s also a significant risk element.
Beware of data infringement and the black box
AI models make mistakes, and banks may be liable in the event of AI hallucinations – the phenomenon wherein a large language model (LLM) perceives patterns or objects that are nonexistent or imperceptible to human observers, creating outputs that are nonsensical or altogether inaccurate.
Enormous quantities of data go into training AI models, and the indiscriminate use of data can lead to infringement lawsuits. The New York Times has alleged that millions of its copyrighted works were used to create the LLMs of Microsoft’s Copilot and OpenAI’s ChatGPT. This is one of more than a dozen AI infringement lawsuits that have surfaced.
Also, regulators have made clear that they want to maintain the ability to audit the things they typically audit. But they are concerned about the potential of AI to disintermediate human touchpoints in trading, IT, business and other processes. They don’t want to run into situations in which businesses can’t provide answers because automated processes were involved.
The Blueprint for an AI Bill of Rights out of the White House Office of Science and Technology Policy says: “You should know how and why an outcome impacting you was determined by an automated system,including the automated system is not the sole input determining the outcome.” And a joint statement from heads of the Consumer Financial Protection Bureau, Justice Department’s Civil Rights Division, Equal Employment Opportunity Commission and Federal Trade Commission notes that “existing legal authorities apply to the use of automated systems and innovative new technologies just as they apply to other practices.”
But as the same joint statement later acknowledges: “Many automated systems are ‘black boxes’ whose internal workings are not clear to most people and, in some cases, even the developer of the tool. This lack of transparency often makes it all the more difficult for developers, businesses, and individuals to know whether an automated system is fair.”
Consider using small rather than large language models
Businesses can advance their GenAI efforts while containing risks by using smaller language models. AI hallucinations stem in part from the fact that the data on which AI models are trained are generic and large, and the questions they are meant to answer are essentially infinite. But if you train an AI model on a relatively narrow slice of data, and you limit the question set to which your AI needs to reply, you also limit the chances of AI hallucinations.
Small language models work for most businesses applications because companies typically don’t need GenAI to answer a broad set of questions. For a virtual assistant, for example, you may just need to respond to the most common customer questions. With small language models, you can meet that requirement and decrease the risk of hallucinations in the process.
Leverage data grooming as an opportunity to derisk
Whether you use LLMs or small language models, it’s become increasingly clear that you can’t just train AI models on whatever exists on the public internet or your social media feeds. That’s how the original ChatGPT model was trained, but it’s a very risky proposition for your business. Understand what data sets you will be using for training.
Before the raw data goes into an AI model, unstructured data must be groomed so it’s in a format the AI model can use. During this data grooming process, you have an opportunity to introduce guardrails that will allow you to understand and track specific data sets. Take this opportunity. It helps you to be cognizant of personally identifiable information (PII) or other sorts of sensitive data that have been ingested into your AI model. Without the proper guardrails, your application could potentially spit out sensitive data to your other customers.
During the grooming process, you can also produce a training artifact to supply as needed, on demand to auditors or your CISO or other IT leaders if they inquire about your data set. Having such artifacts handy can provide welcome relief when concerned parties come calling.
Understand you don’t need to wait to innovate
Derisking AI in these ways is important since various laws and regulations already govern things like how organizations handle consumers’ personal data; European Union policymakers recently agreed to the broad-sweeping A.I Act; and moves by the U.S. government to harness and support AI innovation – such as theNational Artificial Intelligence Research Resource Task Force’s AI implementation plan, the NIST AI Risk Management Framework and the AI Executive Order – could signal that the U.S. government may be on the path to AI regulation.
The companies providing GenAI tools are highly aware of AI challenges and related regulatory requirements. And future releases of tools like Bard, ChatGPT and Copilot may address them. In fact,researchers at AI company Anthropic recently revealed that they have found clues about the inner working of LLMs that might help prevent their misuse and curb potential threats.
While the world waits for these new releases, continue to move forward with AI and GenAI, which are tremendous engines of innovation. But do so in a way that derisks your efforts.
Get the help you need to advance and safeguard your business
Since GenAI is brand new, very few subject matter experts and no incumbent suppliers exist in this arena, so you won’t be teaming up with the usual IT suppliers. You will likely use Nvidia on the compute side, and solutions from smaller companies for much of your other infrastructure.
But small suppliers may not have the support you need everywhere that you need it, and while they may have valuable piece parts of a solution, they won’t bring all the piece parts together.
Seek out a partner that brings together best-of-breed solutions from the storage, data grooming and training layers into a ready-to-go bundle. Make sure that the supplier has the resources and organization to provide the needed support. Explore pretrained business-outcome-specific models, which you can customize with a few of your own datasets and get up and running fast.
Last year was the year of data privacy. This year the focus was on cyber resilience. And next year regulators are likely to focus more of their attentions on AI, GenAI and LLMs.
We’re all learning about GenAI together. But with the right partner and approach, your financial services firm can limit risk, grow more competitive and be ready for whatever comes next.