Building a GPT-4 Powered Google Docs Extension for News Quiz Generation

Published in

Generative AI in the Newsroom

7 min readJun 8, 2023

News quizzes offer an engaging way to connect with our audience and repurpose our news content. However, creating these quizzes requires a significant investment of time and resources to ensure that each question is carefully crafted and relevant. That’s where we saw the potential of generative AI in helping automate this workflow. The NRI Nation has integrated the capabilities of generative AI, specifically GPT-4 to streamline the process of crafting news quizzes, enabling us to deliver our news to our audience in an engaging format that extends our reach.

Developed in-house, ‘AI Assist’ is a Google Docs extension offering our team the ability to access the power of GPT-4 within their workflow. The tool is currently being tested for multiple uses such as creating headlines, summaries, SEO metadata, social media posts, and news quiz questions. As with simpler tools like spell check, we see this as just another tool to help our newsroom in producing news. For the Generative AI Newsroom Challenge, we specifically focused on creating an effective prompt to generate quiz questions, which was incorporated into our Google Docs extension.

*A demo of how our custom Google Docs extension AI Assist works to create quiz questions*

For our news quizzes, we decided to focus on creating multiple-choice and multiple response questions. Before we began testing out prompts, we clearly defined a set of criteria to ensure the quality of the generated question. We used the following criteria to judge the question:

Relevance: Is the question directly relevant to the main objective of the article?
Clarity: Is the question written clearly and is it easy to understand? Does the question have any ambiguity, jargon or misleading phrasing?
Answerability: Are the incorrect answer choices plausible and relevant? Is the correct answer clearly distinguishable from the rest of the choices based on the information in the article?

The quiz questions generated were categorized into three groups: publishable as is, publishable with minor edits, and not publishable. We also recorded the criteria missed in the question to help refine the prompts.

We began systematically experimenting with crafting different prompts using ChatGPT to produce the desired output and decided to focus on multiple choice questions first. Initially, we started off with a general prompt that asked ChatGPT to generate a multiple-choice question based on a news article. But, the questions generated were often not publishable, as ChatGPT frequently missed the article’s main point and would focus on an obscure detail within the article.

We then revised our strategy by specifically giving ChatGPT the criteria mentioned above as part of the prompt. While the results improved, there were still many instances where the AI would generate questions based on obscure details in the article.

The breakthrough came when we asked ChatGPT to generate a prompt for itself to create a multiple-choice question for the news article. This approach resulted in high-quality multiple choice questions consistently. When comparing the ChatGPT generated prompt to our previous prompts, the key improvement made was explicitly providing a one-line summary that focused on the main objective of the article. This made us realize the importance of being specific when prompting ChatGPT in order to help guide it towards producing more accurate outputs. However, from time to time, the questions were still not meeting the relevancy criteria or were having slightly unclear options.

So based on guidance received as part of the Generative AI in the Newsroom Challenge, we transitioned to using GPT-4 directly in the OpenAI playground and added a system prompt which said “You are a precise journalist and editor”. The reason behind this was to guide GPT towards creating more accurate questions.

*A breakdown of the prompt used to generate multiple choice questions*

The majority of the questions generated using this prompt [1] were good enough to be published as is. A few of the questions still required slight revisions for clarity and relevance to the objective of the article. We noticed a unique situation where one question that was about a new bill being introduced didn’t work well as a multiple choice question as it had multiple focus points in the article, but the question only focused on the final point of the bill. This showed that in some cases, questions with multiple responses are needed.

Our attention then shifted to multiple response questions. We reused our successful prompt for the multiple choice question and just replaced “multiple-choice” with “multiple response” in the prompt. However, these attempts were unsuccessful, as GPT was often generating questions where all answers were correct. This led us to further modify the prompt, specifying the requirement to provide at least one incorrect answer choice [2]. As a result, we observed an improvement in the quality of generated multiple-response questions.

In order to further streamline the process of generating a quiz question, we attempted to automate the selection between multiple-choice and multiple-response questions by asking GPT to decide. However, this attempt was unsuccessful as GPT-4 seemed to have a clear bias towards producing multiple choice questions, consistently generating them for all the ten articles we tested. This highlighted the continuing need for human editorial oversight in the final decision-making process.

In response, we concluded that our AI Assist tool would generate both types of quiz questions, leaving the editor to review the options and choose the most suitable question. We also automated the task of a one-line summary of the article for inclusion in the prompt.

*Process Diagram of the Google Docs Extension Workflow for Generating Quiz Questions*

We tested out our first AI-Assist generated quiz questions with ten articles. Based on the experiment, we noticed that the output of multiple choice questions outperformed the output of multiple response questions in terms of quality. The multiple response questions were still not able to focus on the exact objective of the article, highlighting that not every news article was appropriate for a multiple response question.

*Evaluating the quality of generated questions based on question type for 10 news article*

Based on our experiments with prompt engineering for news quizzes we had a few takeaways:

Have an iterative process: Continually refine and adjust prompts until you consistently obtain desirable results.
Leverage GPT’s own power: If the prompts are not yielding the expected results, consider requesting ChatGPT to create a prompt for itself. This can inspire new ideas and further refine the prompt.
Precision is key: Ensure that the prompts are specific and clearly define the article’s objective, the quality criteria for the question, and the number of incorrect answers required. A detailed prompt is more likely to generate the expected output.
Optimize with a system prompt: Assign a specific role to GPT via the system prompt as that tends to help create more accurate responses.

The integration of generative AI into the newsroom shows promise for automating several traditionally manual tasks, including the creation of news quizzes. Our experiments highlight the significance of a systematic approach to prompt engineering in achieving high-quality output from LLMs.

But even though LLMs such as GPT-4 demonstrate extraordinary capabilities in content generation, we find that their general training tends to fall short in producing outcomes that align precisely with our publication’s unique tone, voice, and editorial standards. As a result, our team has adopted a two-pronged approach: refining prompts for a diverse range of use cases using GPT-4 to enhance our AI Assist tool, and simultaneously, fine-tuning open-source LLMs for specific tasks within our newsroom.

By training LLMs on data that is representative of our newsroom, we aim to generate outputs that align more closely with our publication’s standards. We anticipate that a fine-tuned LLM will progress beyond mere contextual comprehension to gain a profound understanding of our target audience, the unique nature of our content, and our journalistic principles and guidelines.

So one of our next major steps is to experiment with creating a customized, fine-tuned model for our newsroom, specifically designed to boost the performance of the LLM on specific tasks for our newsroom.

[1] Prompt to create the multiple choice question: “Create a multiple-choice news quiz question based on the following news article about ${summary}. Ensure that the question is directly relevant to the main takeaway of the article, clearly written, and easy to understand. Provide four answer choices, with the correct answer clearly distinguishable from the rest based on the information in the article. Specify what the correct answer is and provide a brief explanation why the corresponding option is correct by referencing facts from the article. Write the explanation as if you were a journalist explaining the news to the public. ### ${article}”

[2] Prompt to create the multiple response question: “Create a multiple responses news quiz question based on the following news article about ${summary}. Ensure that the question is directly relevant to the main takeaway of the article, clearly written, and easy to understand. Provide four answer choices. There should be more than one correct answer and at least one incorrect answer. The correct answers should be clearly distinguishable from the incorrect one based on the information in the article. Specify what the correct answer is and provide a brief explanation why the corresponding option is correct by referencing facts from the article. Write the explanation as if you were a journalist explaining the news to the public. ### ${article}”

Building a GPT-4 Powered Google Docs Extension for News Quiz Generation

Written by Nikita Roy