Highlights from the NAACL 2024 conference
Capital One’s AI Research team recaps the NAACL 2024 conference, including data set curation and reliable LLM evaluation.
There is a distinct interest in building AI and improving LLMs across the world. The papers and experts featured at the NAACL 2024 conference captured this interest. The conference showed a great emphasis on AI dataset curation, fine-grained control of LLM outputs, efficient and low-resource training and reliable LLM evaluation.
We are amazed at the width and depth of this year’s papers and most importantly, how industry and academic researchers come together to advance high-impact research in machine learning and AI.
Top themes from the NAACL conference and papers we liked
Factual consistency and hallucination reduction in generated outputs
At Capital One, we care deeply about the trustworthiness of LLM models, including factual consistency and hallucination reduction in generated outputs. Many papers at NAACL this year focused on this very crucial topic. For example:
- The paper, "Mitigating Hallucination in Abstractive Summarization with Domain-Conditional Mutual Information”, proposed the PMI method to penalize models when generating texts that are inconsistent with sources so that the generated output would have less hallucination.
- The “LIM-RA” paper proposed a method to detect the factual consistency between generated texts and the sources and showed better results than the widely adopted Alignscore.
- And this great paper on “Reducing hallucination in structured outputs via Retrieval-Augmented Generation (RAG)”, which researched whether the implementation of RAG, would significantly reduce hallucination and allow the generalization of their LLM to out-of-domain settings.
Efficient and reproducible LLM evaluation
Another key direction at the conference was efficient and reproducible LLM evaluation. One of the papers that received the Best Paper Award showed that a big challenge for reliable LLM evaluation is human evaluation guidelines, but only 29% of recent papers in top conferences published their evaluation guidelines. One of the interesting solutions they recommend is to use LLMs to generate evaluation guidelines. To yield more robust and trustworthy benchmarks, researchers also have proposed to reduce examples per dataset, but increase the diversity of datasets selected.
Controllability and dataset curation for pre-training and fine-tuning LLMs
In the realms of pre-training and fine-tuning LLMs, the main theme of this year is controllability and dataset curation. We love the paper, “A Pretrainer’s Guide to Training Data”, which gives a great overview of data curation and data choices in pre-training. For fine-tuning dataset curation, the GOLD method detects Out-of-distribution (OOD) samples and provides feedback to LLM during the course of generation. “R-Tuning”, a paper that is at the center of industry AI adoption and also won the Best Paper Award at NAACL this year, has shown a method to train LLMs to refrain from answering unknown questions.
Capital One’s involvement in NAACL 2024
Our associates contributed to the success of NAACL 2024 by holding key roles in industry track peer review and organizing workshops:
-
Anoop Kumar, Sr. Distinguished Applied Researcher, served as the chair for industry track and co-organized TrustNLP workshop.
-
Daben Liu, VP of Applied Research, served on the committee that picked the best industry track paper.
-
Chenyang Zhu, Principal Associate, also served as one of the session chairs in the industry track and one of the reviewers for the TrustNLP workshop.
We are energized by the incredible research at NAACL and look forward to continuing to engage with the AI research community. Learn more about the types of AI research we are interested in at Capital One.
Explore Capital One's AI efforts and career opportunities
Interested in joining a world-class team that is accelerating state-of-the-art AI research into finance to change banking for good? Explore Applied Research jobs at Capital One.