Decoding Bureaucracy

Generative AI for Accessible Government Policies

--

Starting last year, The Marshall Project requested, examined, and analyzed state corrections department’s publication policies across the United States. These policies establish rules and procedures governing what published materials people who are incarcerated are permitted to read, including dictating which books are banned.

This post details our work on this project, expanding on how we leveraged generative AI to produce summaries of the policy documents. We lay out our motivation for using AI, our technical and editorial processes, and elaborate our ethical considerations and larger takeaways.

Motivating the Need for AI

Our goal with this project is to publish stories and design news products for people affected by prison and jail book bans. We hope to emphasize the challenges incarcerated people face in accessing and holding onto books, promote accountability, and address information needs.

So far we have published a searchable database with banned book lists from 18 states, policy summaries from over 30 states using ChatGPT-3, takeaways from our policy analysis, and most recently, a story based on a tip we received from someone who was formerly incarcerated about the Ohio prison system’s confusing book review process.

A key approach in the project is to put the people affected by the system at the heart of our reporting, design and development processes, ensuring that our stories and our products stem from conversations where real needs, wants and harms (actual and potential) are surfaced by the people closest to the issue. A highly collaborative journalistic design practice has emerged that builds on the rigor, fairness, and independence of traditional journalism combined with new ways to approach our craft like co-design, design justice, product thinking, community engagement, and computational journalism.

When we started hosting community listening sessions earlier this year with prison educators, carceral librarians, people who were formerly incarcerated, and books to prison programs, we were told that there was a need for comprehensive policy summaries that facilitate comparisons between states and that clearly underscore gaps in rules and procedures.

So a question began to form in our minds:

Could ChatGPT do the dreary, under-resourced work of writing policy summaries? If it could, that would free us up to think about fact-checking, editing, and reporting more deeply based on what the summaries showed us.

We found that it worked well and we learned that:

  • ChatGPT excels at turning complex, bureaucratic text into simple summaries as a public service.
  • ChatGPT can do textual analysis to classify common topical features and themes in a body of text.
  • ChatGPT will take a data dictionary and use it to group relevant text based on the definitions in the data dictionary.
  • A human-in-the-loop approach to generative AI encourages accurate, reliable and vetted ground truth datasets, and outputs that are journalistically viable.
  • A machine-human hybrid approach opens up new reporting possibilities without compromising on editorial integrity, and helps already strapped newsrooms overcome resource constraints while allowing reporters, designers, and product teams to prioritize resource decisions by revealing what is must-have now, what is a need-to-add later, and what shouldn’t be greenlit at all.
  • Speaking to people with the most experience around an issue area is essential to working with new technology that doesn’t oversimplify complex social problems.
  • Involving historically marginalized groups and the people that serve them in the design process to solicit feedback and suggestions on tools and information will help ensure that your stories and products are responsive to real needs.

In the next sections, we delve deeper into the technical, editorial, and ethical and design approaches behind this work.

Technical Approach

After we gathered policies from every state, we reached out to various stakeholders, including books-to-prison programs, prison education groups, freedom of information nonprofits, librarians, and carceral librarians and heard that they wanted a way to understand and compare publication policies.

The question was whether ChatGPT could help us ingest the policies and generate summaries with uniform sub-heads that made it easy to compare one state to another, while highlighting any gaps, like whether a state had an appeals process.

After we considered the problem, we decided on a human-in-the-loop approach, involving the following four steps:

1. Ground Truth (human)

A reporter familiar with these policies read every one of the first 14 we collected and extracted the sections relevant to publications. Some publication policies are bundled with the mail policy, and vice versa, so we could not pass in the whole document to ChatGPT.

We did not want to trust ChatGPT or any Natural Language Processing method to correctly capture key parts of the policy. We inserted human judgment at two key points in the process: establishing a benchmark for accurate, reliable, and vetted information to pass into the model, and validating the outputs of the model. It was crucial to the success of this workflow that we be certain we had extracted exactly what we wanted and needed to summarize.

Once we had the extracts in a spreadsheet, this became our ground truth dataset.

2. Generate Policy Sections and Definitions (machine)

We then ran all of the extracts from the policies through ChatGPT (using version 4) in a single chat. We were essentially feeding in our ground truth data and asking it to identify common aspects of the policies to then be able to generate uniform sub-heads with definitions.

While we lost the initial prompts we used for this task (keep a prompt record!), we did manage to generate a useful list of generalized sub-heads along with definitions, which we used in the subsequent step to guide the summarization. Here’s the list of sub-heads and definitions:

  • Publication Sources: This includes information about authorized publishers, distributors, retailers, and other sources of books and publications.
  • Publication Specifications: This covers all details about the physical condition, content requirements, and prohibited features of publications.
  • Review and Approval System: This includes details about the process of reviewing publications, the entities involved in review and approval, and the criteria for approval.
  • Delivery and Receipt of Publications: This covers all details about book delivery timelines, receipt requirements, book handling procedures, and limitations on book ownership.
  • Prohibited Publications and Restrictions: This covers the list of prohibited books, restrictions on personal correspondence, and content that can lead to rejection of publications.
  • Appeals Process and Notifications: This would include details on how rejected books can be appealed, timelines for appeals, details on the entities handling the appeals, and how and when notifications are sent to publishers, inmates, and other relevant parties.
  • Record Keeping: This includes details on how long records of denials are kept.
  • Special Considerations: This includes any unique rules or considerations such as handling of publications for inmates in disciplinary detention, special rules for non-English publications, and conditions under which publications can be redacted or content removed.

3. Prompt & Output (machine)

With the definitions in hand, we re-processed the policy extracts for each policy. For each set of policy extracts, we created a fresh chat interface, because we noticed that compounding tasks in a single chat polluted the outputs.

We told ChatGPT to parse and then group the policy extracts using the sub-heads and definitions generated in the earlier step in a single, chained prompt. We also instructed the model to write a default sentence whenever it did not find information relevant to the sub-head in the policy extracts, since highlighting these gaps was a product priority.

Here’s the main prompt we used:

you are an expert in reading excerpts of prison policies and summarizing them into predefined sub-heads with definitions and using the “important instructions.”

I will provide you with a list of sub-heads with definitions. When you summarize, you will group them under the subheads based on relevance.

Important instructions:

1. When summarizing the notes into the sub-heads, if you do not find information relevant to the sub-head in the notes, include a default sentence “There is no information relevant to this sub-head in the policy.

2. You will ignore any special or text formatting in the notes. You will also ignore bullet points of any kind in the notes.

3. When you are ready to receive the sub-heads with definitions, say, “ready for sub-heads”

4. When you are ready for the notes, say “ready for notes”

Arriving at this prompt was an interactive process. Initially we had a version without special instructions, and it was working. But with certain extracts, we noticed that bullet points or special categories would alter the output. So we rewrote the prompt to account for these edge cases as we went along.

To ensure consistency, when we added special instructions, we went back and reprocessed prior extracts. In retrospect, we don’t think that this was necessary because the new and old outputs varied only slightly. However, it allowed the human in the loop to know that they hadn’t inadvertently altered the output of the model in non-edge cases.

4. Validate (human)

Once we had the summaries, we went back to our ground truth extracts and compared the output to the actual policy extracts, searching for hallucinations or other errors.

Eventually, the summaries were shared with our styles & standards editor who ensured that they complied with The Marshall Project’s rigorous standards, especially those around people-first language. Lastly, the data editor read them once more.

By the end, at least four people had read, edited and verified each output for accuracy and style.

Editorial Value Proposition

In a previous iteration of this workflow, we had read through each policy and taken notes on aspects of the policy that we thought were noteworthy. We then wrote those notes up into simple, one paragraph summaries. We took about two weeks to read through the 50 policies and take notes. When we started writing through the notes, it was so slow-going that our reporter got concerned about time to release and burnout.

That was when the idea of using ChatGPT came up. In about 45 minutes, the reporter individually gave our notes to the model (version 3.5 at the time) and generated simple, one-paragraph summaries that we then fact-checked twice and copy-edited.

Those are currently on The Marshall Project’s website.

In this current, more robust iteration of our workflow, it took us about one and a half weeks to manually extract relevant parts of the policies and generate new summaries for 14 policies. We are still checking and editing them so we don’t have a timestamp on that part of the work process yet. The biggest limitation to our progress was ChatGPT-4’s cap on 25 messages every 3 hours. We submitted a petition to OpenAI to expand our access, but moving forward, we are also considering using the OpenAI API to process the remaining 36 policies since it doesn’t have the same limitation; though it incurs a different cost structure based on number of tokens (both input and output) rather than number of requests.

Here’s an example of what the new summaries look like:

Publication Sources:

Inmates can receive books, periodicals, and newspapers accepted for distribution by the USPS. A departmentally approved vendor is any publisher, bookstore, or book distributor that does mail order business. Books, periodicals, or other publications that are mailed from a religious organization bookstore are considered as coming from an authorized vendor. Personal correspondents cannot mail books, periodicals, or publications directly to inmates.

Publication Specifications:

Inmates may possess a reasonable number of publications, including books, magazines, and newspapers, as directed by the Authorized Personal Property Schedule in DOM 54030.17. Publications addressed to inmates shall be processed in accordance with CCR 3134.1. All incoming paperback and hardback books, and any enclosures within them, must be inspected prior to being altered and/or issued.

Review and Approval System:

All non-confidential inmate mail, incoming or outgoing, is subject to being read in its entirety by designated staff. All incoming mail shall be inspected for contraband prior to issuance. Mail shall only be disallowed if it violates CCR Sections 3006, 3135, any other applicable regulations, or DOM Sections 54010.13 and 54010.14.

Delivery and Receipt of Publications:

All incoming books, magazines, or newspapers must be inspected before being issued. Delivery by staff shall be completed as soon as possible, but not later than 15 calendar days after the institution receives the book, except during the holiday season and during modified programs of affected inmates.

Prohibited Publications and Restrictions:

Prison authorities may exclude obscene publications or writings, any manner of contraband as described in CCR 3006, and any matter concerning gambling or a lottery. Material is considered obscene when it appeals to deviant sexual groups and portrays explicit sexual content, non-consensual behavior, or violent conduct. The CDCR shall distribute a centralized list of disapproved publications that are prohibited as contraband.

Appeals Process and Notifications:

Disapproved material shall be referred to staff not below the level of Correctional/Facility Captain for determination and appropriate action. The CDCR Form 1819, Notification of Disapproval-Mail/Packages/Publications, shall be utilized by each institution/facility when incoming or outgoing mail/packages/publications addressed to or being sent by an inmate are withheld or disallowed. The CDCR Form 1819 informs the inmate of the reason, disposition, name of official disallowing the mail/package/publication, and the name of the official to whom an appeal can be directed.

Record Keeping:

A copy of the CDCR Form 1819 and the supporting document(s) shall be retained by each institution/facility for a minimum of seven years for litigation purposes. After seven years if the material is not needed it shall be destroyed. If a lawsuit has been filed as a result of mail being disapproved, the CDCR Form 1819 and the supporting document(s) will be retained for two years from the conclusion of the suit.

Special Considerations:

Correspondence in a language other than English to or from an inmate is subject to the same regulations governing all other mail and may be subject to a delay for translation of its contents by staff. When such delay exceeds normal mail processing by five business days, the inmate shall be notified in writing of the delay, the reason for the delay, and subsequent determinations and actions regarding that item of mail. If staff is unable to translate the letter and its contents within 20 business days of notice to the inmate, then the letter shall be delivered to the inmate untranslated.

The generative workflow saved us considerable time. It has also entailed a different editorial balance that emphasizes fact-checking, editing, and verification that met The Marshall Project’s rigorous standards. It has also enabled collaboration across teams within the newsroom that allowed us to start a conversation around our use of generative AI and the standards that we should have as an organization when designing reporting protocols with this technology.

We have been documenting our work with ChatGPT internally in a GitHub project, including prompts that worked and failed, lessons learned and tips for others in the newsroom. This documentation serves the purpose of creating a place to discuss this work, while also making it easier for us to explain it to people outside of the newsroom. It can also nourish broader industry conversations that center reporters in the machine process as fundamental components of accurate and reliable generative information gathering and distribution.

Ethical & Design Considerations

It is important to acknowledge the limitations and ethical considerations of AI deployment in newsrooms. The need for human oversight, the potential for biases in AI tasks, and the responsible use of AI technologies remain crucial topics, even in simple use cases.

Our thinking here is that by distrusting ChatGPT enough to leave nothing to chance, we designed a workflow that devotes the human side of the equation to sourcing, fact-checking, and revising the machine’s outputs, while the actual processing of the large amount of data gets handled by AI, leveraging its tirelessness and ability to take commands and iterate on them rapidly.

In the end, through this hybrid approach that combines AI capabilities with journalist expertise, we unlocked complex bureaucracy, making information accessible and actionable for a wider readership, which we believe is a foundational public service.

We are also motivated by a design practice that attempts to center historically marginalized people in the design, development, and deployment processes of news products. Community listening has been an important tool for us to get the feedback that we need to do this work. Some in the design space call this user research. At The Marshall Project, we prefer to think about the people we cover as part of a community, rather than as “users”. This helps move us away from the consumer-oriented language of traditional design and product methods.

Lessons for other newsrooms

Our experience has shed light on several insights and considerations that would be valuable to other journalists in the field.

First, crafting a well-defined prompt and refining categories through iterative interactions is crucial for generating accurate and relevant definitions (ultimately a data dictionary). Additionally, integrating AI technologies into newsrooms can streamline tasks like summarization, enabling journalists to focus more on vetting and fact-checking, thus enhancing the overall quality of reporting. It also highlights the importance of documenting your process and working in public as a part of using generative technologies ethically and responsibly.

Second, the ability to process and render complex government documents intelligible and actionable is a valuable service newsrooms can provide to their readers. By adopting a hands-on, hybrid process that leverages generative AI tools like ChatGPT, newsrooms can effectively summarize intricate policies, making them more accessible to a wider audience.

From a public service standpoint, this approach underscores the potential of generative AI in democratizing access to complex information and empowering communities historically excluded from intricate bureaucratic processes by breaking down barriers to understanding formal government documents. Newsrooms can serve as trustworthy intermediaries, ensuring the public’s comprehension of critical policies, fostering transparency and enabling communities to demand accountability.

Moving forward, we plan to think more about how to use ChatGPT to make sense of complex bureaucracies. One idea that we are actively experimenting with is to model a real-world system, like the judges in a court system, using a relational or graph database. We hope to have reporters fill in information about judges and judicial candidates like their terms in office, election history, roles they’ve occupied on the bench, campaign finance, and job history. We then want to ask ChatGPT to write sentences for different reading levels and languages about each judge, based on judgements we’ll draw from our reporting. For example, perhaps we learn that competitive races are notable and rare. If a judge has never run opposed, that’s worth a small mention. But if they’ve had recent, competitive races, then that should be a significant feature in the summary ChatGPT helps us write.

There are many ways in which ideas like this can fail, and we need to deeply consider them before we make anything public. The criminal justice system is a high stakes environment; people already experience trauma of all sorts and they sometimes rely on our stories and products to understand their experience. The harms even of something as simple as summarizing policies can range from inaccuracies to reflecting biases of the training set that run counter to our own reporting and values.

We believe that embracing generative AI responsibly means developing a practice that involves reporters in critical parts of machine learning workflow, while engaging people with lived experience in the design process, to better serve them and ease their access to information resources.

This journalistic-design approach to generative tech also provides a counterpoint to all of the emergent market–oriented AI products that ultimately seek profit rather than public service, a democratic value that journalism is uniquely poised to promote.

--

--

Computational Journalist at The Marshall Project + Adjust Professor at The New School + Project & Product Designer at J&D Lab