Why you shouldn’t turn to ChatGPT for financial advice just yet

As artificial intelligence continues to make headlines, one pressing question looms: Could AI chatbots like ChatGPT assist or potentially replace financial professionals? A new study by Washington State University and Clemson University researchers, analyzing more than 10,000 AI responses to financial exam questions, provides some sobering answers.

“It’s far too early to be worried about ChatGPT taking finance jobs completely,” says study author DJ Fairhurst of WSU’s Carson College of Business in a statement. “For broad concepts where there have been good explanations on the internet for a long time, ChatGPT can do a very good job at synthesizing those concepts. If it’s a specific, idiosyncratic issue, it’s really going to struggle.”

The research, published in the Financial Analysts Journal, addresses a significant industry concern. Goldman Sachs estimates that 15% to 35% of finance jobs could potentially be automated by AI, while KPMG suggests that generative AI may revolutionize how asset and wealth managers operate. However, these projections rely on a critical assumption – that AI systems possess an adequate understanding of finance.

“Passing certification exams is not enough. We really need to dig deeper to get to what these models can really do,” notes Fairhurst.

The researchers assembled a comprehensive dataset of 1,083 multiple-choice questions drawn from various financial licensing exams, including the Securities Industry Essentials (SIE) exam and Series 7, 6, 65, and 66 exams. These are the same tests that human financial professionals must pass to become licensed. Currently, about 42,000 people become registered representatives annually, with more than 600,000 working in the securities industry.

Using this question bank, the study tested four different AI models: Google’s Bard, Meta’s LLaMA, and two versions of OpenAI’s ChatGPT (versions 3.5 and 4). The researchers evaluated not just answer accuracy but also used sophisticated natural language processing techniques to compare how well the AI systems could explain their reasoning compared to expert-written explanations.

The results revealed distinct tradeoffs among the AI models. Of all the models tested, ChatGPT 4 emerged as the clear leader, with accuracy rates 18 to 28 percentage points higher than other models. However, an interesting development emerged when researchers fine-tuned the earlier free version of ChatGPT 3.5 by feeding it examples of correct responses and explanations. After this tuning, it nearly matched ChatGPT 4’s accuracy and even surpassed it in providing answers that resembled those of human professionals.

Both models still showed significant limitations. While they performed well on questions about trading, customer accounts, and prohibited activities (73.4% accuracy), performance dropped to 56.6% on questions about evaluating client financial profiles and investment objectives. The models gave more inaccurate answers for specialized situations, such as determining clients’ insurance coverage and tax status.

The research team isn’t stopping with exam questions. They’re now exploring other ways to test ChatGPT’s capabilities, including a project that asks it to evaluate potential merger deals. Taking advantage of ChatGPT’s initial training cutoff date of September 2021, they’re testing it against known outcomes of deals made after that date. Preliminary findings suggest the AI model struggles with this more complex task.

These limitations have important implications for the finance industry, particularly regarding entry-level positions.

“The practice of bringing a bunch of people on as junior analysts, letting them compete and keeping the winners – that becomes a lot more costly,” explains Fairhurst. “So it may mean a downturn in those types of jobs, but it’s not because ChatGPT is better than the analysts, it’s because we’ve been asking junior analysts to do tasks that are more menial.”

Based on these findings, AI’s immediate future in finance appears to be collaborative rather than replacive. While these systems demonstrate impressive capabilities in summarizing information and handling routine analytical tasks, their error rates – particularly in complex, client-facing situations – indicate that human oversight remains essential in an industry where mistakes can have serious financial and legal consequences.

Source : https://studyfinds.org/chatgpt-financial-advice/