Skip to main content

Reliable Zero-Shot Classification with the Trustworthy Language Model

Run in Google ColabRun in Google Colab

In zero-shot classification, we use a Foundation model to classify input data into predefined categories (aka. classes), without having to train this model on a dataset manually annotated with these categories. This utilizes the pre-trained model’s world knowledge to accomplish tasks that would require much more work training classical machine learning models from scratch. The problem with zero-shot classification of text with LLMs is we don’t know which LLM classifications we can trust. Most LLMs are prone to hallucination and will often predict a category even when their world knowledge does not suffice to justify this prediction.

This tutorial demonstrates how you can easily replace any LLM with Cleanlab’s Trustworthy Language Model (TLM) to gauge the trustworthiness of each zero-shot classification. Use the TLM to ensure reliable classification where you which model predictions cannot be trusted. Before this tutorial, we recommend completing the TLM quickstart tutorial.

Setup

Using TLM requires a Cleanlab account. Sign up for one here if you haven’t yet. If you’ve already signed up, check your email for a personal login link.

The Python client package can be installed using pip:

%pip install cleanlab-studio
import re
import pandas as pd
from tqdm import tqdm
from difflib import SequenceMatcher

from cleanlab_studio import Studio

In Python, launch your Cleanlab Studio client using your API key.

# Get your API key from https://app.cleanlab.ai/account after creating an account.
studio = Studio("<insert your API key>")

Let’s load an example classification dataset. Here we consider legal documents from the “US” Jurisdiction of the Multi_Legal_Pile, a large-scale multilingual legal dataset that spans over 24 languages. We aim to classify each document into one of three categories: [caselaw, contracts, legislation]. We’ll prompt our TLM to categorize each document and record its response and associated trustworthiness score. You can use the ideas from this tutorial to improve LLMs for any other text classification task!

First download our example dataset and then load it into a DataFrame.

wget -nc 'https://cleanlab-public.s3.amazonaws.com/Datasets/zero_shot_classification.csv'
df = pd.read_csv('zero_shot_classification.csv')
df.head(2)
index text
0 0 Probl2B\n0/NV Form\nRev. June 2014\n\n\n\n ...
1 1 UNITED STATES DI...

Perform Zero Shot Classification with TLM

Let’s initalize a TLM object. Here we use default TLM settings, but check out the TLM quickstart tutorial for configuration options that can produce better results.

tlm = studio.TLM()

Next, let’s define a prompt template to instruct the TLM on how to classify each document’s text. Write your prompt just as you would with any other LLM when adapting it for zero-shot classification. A good prompt template might contain all the possible categories a document can be classified as, as well as formatting instructions for the LLM response. Of course the text of the document is crucial.

'You are an expert Legal Document Auditor. Classify the following document into a single category that best represents it. The categories are: {categories}. In your response, first provide a brief explanation as to why the document belongs to a specific category and then on a new line write "Category: <category document belongs to>". \nDocument: {document}'

If you have a couple labeled examples from different classes, you may be able to get better LLM predictions via few-shot prompting (where these examples + their classes are embedded within the prompt). Here we’ll stick with zero-shot classification for simplicity, but note that TLM can also be used for few-shot classification just like any other LLM.

Lets apply the above prompt template to all documents in our dataset and form the list of prompts we want to run. For one arbitrary document, we print the actual corresponding prompt fed into the TLM below.

zero_shot_prompt_template = 'You are an expert Legal Document Auditor. Classify the following document into a single category that best represents it. The categories are: {categories}. In your response, first provide a brief explanation as to why the document belongs to a specific category and then on a new line write "Cateogry: <category document belongs to>". \nDocument: {document}'
categories = ['caselaw', 'contracts', 'legislation']
string_categories = str(categories).replace('\'', '')

# Create a DataFrame to store results and apply the prompt template to all examples
results_df = df.copy()
results_df['prompt'] = results_df['text'].apply(lambda x: zero_shot_prompt_template.format(categories=string_categories, document=x))

print(f"{results_df.at[7, 'prompt']}")
    You are an expert Legal Document Auditor. Classify the following document into a single category that best represents it. The categories are: [caselaw, contracts, legislation]. In your response, first provide a brief explanation as to why the document belongs to a specific category and then on a new line write "Cateogry: <category document belongs to>". 
Document: UNITED STATES DISTRICT COURT
SOUTHERN DISTRICT OF NEW YORK

UNITED STATES OF AMERICA,

v. ORDER

JOSE DELEON, 14 Cr. 28 (PGG)

Defendant.


PAUL G. GARDEPHE, U.S.D.J.:

It is hereby ORDERED that the violation of supervised release hearing currently

scheduled for January 8, 2020 is adjourned to January 15, 2020 at 3:30 p.m. in Courtroom 705

of the Thurgood Marshall United States Courthouse, 40 Foley Square, New York, New York.

Dated: New York, New York
January 8, 2020

Now we prompt the TLM and save the output responses and their associated trustworthiness scores for all examples. We recommend the try_prompt() method to run TLM over datasets with many examples.

outputs = tlm.try_prompt(results_df['prompt'].to_list())

results_df[["response","trustworthiness_score"]] = pd.DataFrame(outputs)
    Querying TLM... 100%|██████████|

Parse raw LLM Responses into Category Predictions

Our prompt template asks the LLM to explain it’s predictions, which can boost their accuracy. We now parse out the classification prediction, which should be exactly one of the categories for each document. Because LLMs don’t necessarily follow output formatting instructions perfectly, we define a function that parses out only the expected categories. If no value out of the possible categories is directly mentioned in the response, the category with greatest string similarity to the response is returned (along with a warning).

Note If there are no close matches between the LLM response and any of the possible categories, then the last entry of the categories list is returned. We can add an “other” category to account for bad responses that are hard to parse into a specific category.

categories_with_bad_parse = categories + ["other"]
categories_with_bad_parse
Optional: Define helper methods to parse categories and better display results. (click to expand)

import warnings


def parse_category(
response: str,
categories: list,
disable_warnings: bool = False,
) -> str:
"""Extracts one of the provided categories from the response using regex patterns. Returns last extracted category if multiple exist.
If no category out of the possible `categories` is directly mentioned in the response, the category with greatest string similarity to the response is returned (along with a warning).
If there are no close matches between the LLM response and any of the possible `categories`, then the last entry of the `categories` list is returned.

Params
------
response: Response from the LLM
categories: List of expected categories, the last value of this list should be considered the default/baseline value (eg. “other”),
that value will be returned if there are no close matches.
disable_warnings: If True, print warnings are disabled
"""

response_str = str(response)

# Create string pattern of listed constrain outputs
escaped_categories = [re.escape(output) for output in categories]
categories_pattern = "(" + "|".join(escaped_categories) + ")"

# Parse category if LLM response is properly formatted
exact_matches = re.findall(categories_pattern, response_str, re.IGNORECASE)
if len(exact_matches) > 0:
return str(exact_matches[-1])

# If there are no exact matches to a specific category, return the closest category based on string similarity.
best_match = max(
categories, key=lambda x: SequenceMatcher(None, response_str, x).ratio()
)
similarity_score = SequenceMatcher(None, response_str, best_match).ratio()

if similarity_score < 0.5:
warning_message = (
f"None of the categories remotely match raw LLM output: {response_str}.\n"
+ "Returning the last entry in the constrain outputs list."
)
best_match = categories[-1]

else:
warning_message = f"None of the categories match raw LLM output: {response_str}"

if not disable_warnings:
warnings.warn(warning_message)

return best_match


def display_result(results_df: pd.DataFrame, index: int):
"""Displays the TLM result for the example from the dataset whose `index` is provided."""

print(f"TLM predicted category: {results_df.iloc[index].predicted_category}")
print(f"TLM trustworthiness score: {results_df.iloc[index].trustworthiness_score}\n")
print(results_df.iloc[index].text)
results_df['predicted_category'] = results_df['response'].apply(lambda x: parse_category(x, categories_with_bad_parse))

Analyze Classification Results

Let’s first inspect the most trustworthy predictions from our model. We sort the TLM outputs over our documents to see which predictions received the highest trustworthiness scores.

results_df = results_df.sort_values(by='trustworthiness_score', ascending=False)
display_result(results_df, index=0)
    TLM predicted category: legislation
TLM trustworthiness score: 0.9555185831020572




DEPARTMENT OF TRANSPORTATION
National Highway Traffic Safety Administration
49 CFR Parts 555, 571, and 591
[Docket No. NHTSA-2018-0092]
RIN 2127-AL99
Pilot Program for Collaborative Research on Motor Vehicles With High or Full Driving Automation; Extension of Comment Period

AGENCY:
National Highway Traffic Safety Administration (NHTSA), Department of Transportation (DOT).


ACTION:
Advance notice of proposed rulemaking (ANPRM); extension of comment period.


SUMMARY:
In response to a request from the public, NHTSA is announcing a two-week extension of the comment period on the ANPRM on a Pilot Program for Collaborative Research on Motor Vehicles with High or Full Driving Automation. The comment period for the ANPRM was originally scheduled to end on November 26, 2018. It will now end on December 10, 2018.


DATES:
The comment period for the ANPRM published on October 10, 2018 at 83 FR 50872 is extended. Written comments on the ANPRM must be received on or before December 10, 2018 in order to be considered timely.


ADDRESSES:
Comments must be submitted by one of the following methods:
• Federal eRulemaking Portal: Go to http://www.regulations.gov. Follow the online instructions for submitting comments.
• Mail: Docket Management Facility, M-30, U.S. Department of Transportation, West Building, Ground Floor, Room W12-140, 1200 New Jersey Avenue SE, Washington, DC 20590.
• Hand Delivery or Courier: U.S. Department of Transportation, West Building, Ground Floor, Room W12-140, 1200 New Jersey Avenue SE, Washington, DC, between 9 a.m. and 5 p.m. Eastern time, Monday through Friday, except Federal holidays.
• Fax: 202-493-2251.
Regardless of how you submit your comments, they must include the docket number identified in the heading of this notice.

Note that all comments received, including any personal information provided, will be posted without change to http://www.regulations.gov. Please see the “Privacy Act” heading below.
You may call the Docket Management Facility at 202-366-9324.

Docket: For access to the docket to read background documents or comments received, go to http://www.regulations.gov or the street address listed above. We will continue to file relevant information in the docket as it becomes available.

Privacy Act: In accordance with 5 U.S.C. 553(c), DOT solicits comments from the public to better inform its decision-making process. DOT posts these comments, without edit, including any personal information the commenter provides, to http://www.regulations.gov, as described in the system of records notice (DOT/ALL-14 FDMS), which can be reviewed at https://www.transportation.gov/privacy. Anyone can search the electronic form of all comments received into any of our dockets by the name of the individual submitting the comment (or signing the comment, if submitted on behalf of an association, business, labor union, etc.).


FOR FURTHER INFORMATION CONTACT:

For research and pilot program issues: Dee Williams, Office of Vehicle Safety Research, 202-366-8537, Dee.Williams@dot.gov, National Highway Traffic Safety Administration, 1200 New Jersey Avenue SE, Washington, DC 20590-0001.

For legal issues: Stephen Wood, Assistant Chief Counsel, Vehicle Rulemaking and Harmonization, Office of Chief Counsel, 202-366-2992, Steve.Wood@dot.gov, at the same address.



SUPPLEMENTARY INFORMATION:
On October 10, 2018, NHTSA published an ANPRM to obtain public comments on the factors and structure that are appropriate for the Agency to consider in designing a national pilot program that will enable the Agency to facilitate, monitor and learn from the testing and development of the emerging advanced vehicle safety technologies and to assure the safety of those activities. The ANPRM stated that the closing date for comments is November 26, 2018.

On November 16, 2018, NHTSA received a request from the Uber Technologies, Inc. for a two-week extension of the comment period. The request can be found in the docket for the ANPRM listed above under ADDRESSES. NHTSA has considered this request and believes that a 14-day extension beyond the original due date is desirable to provide additional time for the public to comment on the complex and novel questions in the ANPRM. This is to notify the public that NHTSA is extending the comment period on the ANPRM, and allowing it to remain open until December 10, 2018.

Issued in Washington, DC, pursuant to authority delegated in 49 CFR 1.81 and 1.95.
Heidi Renate King,
Deputy Administrator.


[FR Doc. 2018-25532 Filed 11-19-18; 4:15 pm]
BILLING CODE 4910-59-P


A document about “DEPARTMENT OF TRANSPORTATION National Highway Traffic Safety Administration” is very clearly belonging to some legislative measure so it makes sense the TLM classifies it into the “legislation” category with a high trustworthiness score.

display_result(results_df, index=1)
    TLM predicted category: legislation
TLM trustworthiness score: 0.953157765658964



ENVIRONMENTAL PROTECTION AGENCY
40 CFR Part 300
[FRL-7034-8]
National Oil and Hazardous Substances Pollution Contingency Plan; National Priorities List

AGENCY:
Environmental Protection Agency.


ACTION:
Notice of intent to delete the V&M/Albaladejo Superfund Site from the National Priorities List.


SUMMARY:

The Environmental Protection Agency (EPA) Region II is issuing a notice of intent to delete the V&M/Albaladejo Superfund Site (Site), located in the Almirante Norte Ward of the municipality of Vega Baja, Puerto Rico, from the National Priorities List (NPL) and requests public comment on this action. The NPL is Appendix B of the National Oil and Hazardous Substances Pollution Contingency Plan (NCP), 40 CFR part 300, which EPA promulgated pursuant to Section 105 of the Comprehensive Environmental Response, Compensation, and Liability Act (CERCLA) of 1980, as amended. The EPA and the Commonwealth of Puerto Rico, through the Puerto Rico Environmental Quality Board, have determined that all appropriate response actions under CERCLA have been completed and that the Site poses no significant threat to public health or the environment. In the “Rules and Regulations” Section of today's Federal Register, we are publishing a direct final notice of deletion of the V&M/Albaladejo Superfund Site without prior notice of this action because we view this as a noncontroversial revision and anticipate no significant adverse comment. We have explained our reasons for this deletion in the preamble to the direct final deletion. If we receive no significant adverse comment(s) on this action, we will not take further action on this notice of intent to delete. If we receive significant adverse comment(s), we will withdraw the direct final notice of deletion and it will not take effect. We will, as appropriate, address all public comments. If, after evaluating public comments, EPA decides to proceed with deletion, we will do so in a subsequent final deletion notice based on this notice of intent to delete. We will not institute a second comment period on this notice of intent to delete. Any parties interested in commenting must do so at this time. For additional information, see the direct final notice of deletion which is located in the Rules section of this Federal Register.


DATES:
Comments concerning this Site must be received by September 20, 2001.


ADDRESSES:
Written comments should be addressed to: Caroline Kwan, Remedial Project Manager, Emergency and Remedial Response Division, U.S. Environmental Protection Agency, Region II, 290 Broadway, 20th Floor, New York, New York 10007-1866.


FOR FURTHER INFORMATION CONTACT:
Ms. Caroline Kwan at the address provided above, or by telephone at (212) 637-4275, by Fax at (212) 637-4284 or via e-mail at Kwan.Caroline@EPA.GOV.



SUPPLEMENTARY INFORMATION:

For additional information, see the Direct Final Notice of Deletion which is located in the Rules section of this Federal Register.

Authority:
33 U.S.C. 1321(c)(2); 42 U.S.C. 9601-9675; E.O. 12777, 56 FR 54757, 3 CFR, 1991 Comp.; p. 351; E.O. 12580, 52 FR 2923, 3 CFR, 1987 Comp.; p. 193.


Dated: August 2, 2001.
William J. Muszynski,
Acting EPA Regional Administrator, U.S. EPA, Region II.


[FR Doc. 01-20891 Filed 8-20-01; 8:45 am]
BILLING CODE 6560-50-P


Another document titled as “National Oil and Hazardous Substances Pollution Contingency Plan; National Priorities List” is very clearly belonging to some legislative measure so it makes sense the TLM classifies it into the “legislation” category with a high trustworthiness score.

display_result(results_df, index=2)
    TLM predicted category: contracts
TLM trustworthiness score: 0.9529378220184592

 

Exhibit 10.68

 

Amendment to Loan Agreement

 

This Amendment to Loan Agreement (the “Amendment”) is entered into on October
11, 2017 (the “Effective Date”) by and between Law Insurance Broker Co., Ltd.,
(“Party A”) and Action Holdings Financial Limited, a corporation duly organized
and existing under the laws of British Virgin Islands (“Party B”). For the
purposes of this Agreement, the parties may individually be referred to as
“Party” or collectively be referred to as “Parties”, as case may be.

 

WHEREAS, Party A and Party B are parties to a loan agreement with the effective
date of October 11, 2016 with certain loan amount at NTD70,000,000 (“Loan
Agreement”); and

 

WHEREAS, the Parties would like to amend the terms and conditions contained in
the Loan Agreement through this Amendment.

 

NOW THEREFORE, the Parties agree to amend the Loan Agreement as follows:

 

1.Term for the Loan Agreement shall be extended from October 11, 2017 to October
10, 2018 (the “Extended Term”)

 

2.The fixed interest rate shall be increased from 2.0% to 2.5% for such Extended
Term.

 

3.The accrued interest for the term of Loan Agreement (from October 11, 2016 to
October 10, 2017) shall be made by Party B by December 31, 2017.

 

4.The principal amount of the Loan Agreement together with the accrued interest
for the Extended Term shall be paid in one lump sum before October 10, 2018.

 

IN WITNESS WHEREOF, the Parties have duly executed this Amendment, or have
caused this Amendment to be duly executed on their behalf, as of the date first
above written. This Agreement is executed in duplicate, with each Party holding
one original.

 

Party A (Lender):

For and on behalf of

Law Insurance Broker Co., Ltd. (seal)

/s/ Shu-Fen Lee

Authorized representative: Shu-Fen Lee

No: 86300857

 

Party B (Borrower):

For and on behalf of

Action Holdings Financial Limited (seal)

/s/ Mao Yi-Hsiao

Authorized representative: Mao Yi-Hsiao

No. 53675377

 

 

 

 


This document about “Amendment to Loan Agreement” is very clearly a contract so it makes sense the TLM classifies it into the “contracts” category with a high trustworthiness score.

Least Trustworthy Predictions

Now let’s see which classifications predicted by the model are least trustworthy. We sort the data by trustworthiness scores in the opposite order to see which predictions received the lowest scores. Observe how model classifications with the lowest trustworthiness scores are often incorrect, corresponding to examples with vague/irrelevant text or documents possibly belonging to more than one category.

results_df = results_df.sort_values(by='trustworthiness_score')
display_result(results_df, index=0)
    TLM predicted category: contracts
TLM trustworthiness score: 0.18005438243377658

Probl2B
0/NV Form
Rev. June 2014



United States District Court
for
the District of Nevada

REQUEST FOR EARLY TERMINATION FROM SUPERVISED RELEASE
Probation Form 35 (Termination of Supervised Release/Probation) is Attached
January 2, 2020

Name of Offender: Sebastian M. Paulin

Case Number: 2:11CR00381

Name of Sentencing Judicial Officer: Honorable James C. Mahan

Date of Original Sentence: December 14, 2015

Original Offense: Distribution of a Controlled Substance Schedule II and Structuring
Transactions to Evade Reporting Requirements

Original Sentence: 24 Months prison, followed by 36 Months TSR.

Date Supervision Commenced: December 12, 2017


PETITIONING THE COURT


✓ : To terminate the term of supervision.

CAUSE

The purpose of this report is to request an early termination of Sebastian M. Paulin's supervised
release. By way of case history, Paulin was sentenced to 24 months imprisonment, followed by
60 months supervised release for the offense Conspiracy to Distribute a Controlled Substance.
Paulin commenced supervised release in the District of Nevada on December 12, 2017. Paulin is
expected to complete supervised release on December 11, 2020.

Pursuant to 18 U.S.C. § 3583(e)(1), the Court may, after considering the factors set forth in
section 3553(a)(1), (a)(2)(B), (a)(2)(C), (a)(2 )(D), (a)(4),( a)(5), (a)(6), and (a)(7) terminate a
term of supervised release and discharge the defendant released at any time after the expiration
of one year of supervised release, if the Court is satisfied that such action is warranted by the
conduct of the person under supervision and is in the interest of justice.
RE: Sebastian M. Paulin
Probl2B
D/NV Form
Rev. June 2014

The Guide to Judiciary Policy, Volume 8E, Chapter 3, has endorsed criteria for assessing
whether an offender who satisfies the minimal statutory factors should be recommended for early
termination: At 18 months, there is a presumption in favor of recommending early termination
for persons who meet the following criteria: (I) The person does not meet the criteria of a career
drug offender or career criminal (as described in 28 U.S.C. § 944(h)) or has not committed a sex
offense or engaged in terrorism; (2) The person presents no identified risk of harm to the public
or victims; (3) The person is free from any court-reported violations over a 12-month period; (4)
The person demonstrates the ability to lawfully self-manage beyond the period of supervision;
(5) The person is in substantial compliance with all conditions of supervision; and (6) The person
engages in appropriate prosocial activities and receives sufficient prosocial suppo11 to remain
lawful well beyond the period of supervision.

While on supervised release, Paulin has not incurred any arrests and/or documented negative
contact with law enforcement. Paulin has a stable residence and is currently retired. Moreover, as
previously reported to the Court, Paulin suffers from a debilitating medical condition. Paulin has
satisfied his conditions of supervised release. Paulin appears to meet the criteria for early
termination as endorsed by Guide to Judiciary Policy, Volume SE, Chapter 3.

The United States Attorney's Office was notified of this recommendation via a memorandum
dated December 23, 2019 and was given IO days to file any written objections to early
termination with our office. Assistant United States Attorney Susan Cushman responded and
concurred with our recommendation.

To date, Paulin has served 24 months of the 36 months' supervised release ordered. The
Probation Office has no reason to believe that he would return to criminal activity if terminated
from supervision and respectfully recommends early termination be granted in this matter. If the
Court agrees with our recommendation, we have attached a Probation form 35 for your signature.
If your Honor has any additional questions or concerns regarding this matter, please feel free to
contact the undersigned officer at (702) 527-7263.



Respectfully submitted,
Digitally signed by
Kevin Rivera
Date: 2020.01.06
12:04:41 -08'00'
Kevin Rivera
United States Probation Officer Assistant
C
RE: Sebastian M. Paulin
Probl2B
D/NV Form
Rev. June 2014




Approved:
Todd J. Fredlund
2020.01.06
11:33:41 -08'00'
Todd J. Fredlund
Supervisory United States Probation Officer



THE COURT ORDERS


□ Terminate supervised release.

□ Other (please include Judicial Officer instructions below):




Signature of Judicial Officer

January 8, 2020
Date
PROBATION FORM NO. 35
Report and Order Terminating Probation/
(1/92)
Supervised Release
Prior to Original Expiration Date



United States District Court
FOR THE DISTRICT OF

NEVADA




UNITED STATES OF AMERICA

V. Crim No. 2: I I CR0038 I
Sebastian M. Paul in



On December 12, 2017 the above named was placed on supervised release for a period of 3 years. He has complied

with the rules and regulations of supervised release and is no longer in need of supervised release. It is accordingly

recommended that he/she be discharged from supervised release.


Respectfully submitted,




Kevin Rivera
United States Probation Officer Assistant


ORDER OF THE COURT

Pursuant to the above report, it is ordered that the defendant is discharged from supervised release and that

the proceedings in the case be terminated.



8th
Dated this ___ January ,20_
day of ____ 20




James C. Mahan
Senior United States District Judge

This is clearly not a contract but instead a caselaw document with a case number. It’s good to see that the TLM gives a very low trustworthiness score.

display_result(results_df, index=1)
    TLM predicted category: contracts
TLM trustworthiness score: 0.5026082091951473

Case 1:20-cv-00069-JPH-MPB Document 1-1 Filed 01/08/20 Page 1 of 9 PageID #: 9




(;+,%,7$




  
Case 1:20-cv-00069-JPH-MPB Document 1-1 Filed 01/08/20 Page 2 of 9 PageID
49D07-1912-CT-051592 #: 12/12/2019
Filed: 10 9:15 AM
Clerk
Marion Superior Court, Civil Division 7 Marion County, Indiana




STATE OF INDIANA ) IN THE MARION COURT
COUNTY OF MARION ) CAUSE NO.
JEFFREY SCHWARTZ and
CARI SCHWARTZ, Individually and
as the Parents and Natural Guardians of
JOSEPHINE SCHWARTZ, a Minor,

Plaintiffs;


v.


ANTHEM INSURANCE COMPANIES, INC.,
vvvvvvvvvvvvvvvvvv




D/B/A ANTHEM BLUE CROSS AND
BLUE SHIELD,
ACCREDO HEALTH GROUP, INC.,
EXPRESS SCRIPTS, INC.,
KROGER PRESCRIPTION PLANS, INC., and
KROGER SPECIALTY PHARMACY, INC.;

Defendants.


PLAINTIFFS’ COMPLAINT FOR DAMAGES
Come now the Plaintiffs, Jeffrey Schwartz and Cari Schwartz, Individually and as the


Parents and Natural Guardians of Josephine Schwartz, by counsel, and for their cause 0f action


against the above-named Defendants state:


1. The Plaintiffs, Jeffrey Schwartz (“Jeff”) and Cari Schwartz (“Cari”), are residents

of the State of Indiana.


2. Jeff and Cari are the parents of Josephine Schwartz (“Josephine”), a minor


3. Defendant Anthem Insurance Companies, Inc. (“Anthem”) is an Indiana domestic

insurance company with its headquarters in Indianapolis, Indiana.


4. Anthem owns and does business as Anthem Blue Cross Blue Shield.
Case 1:20-cv-00069-JPH-MPB Document 1-1 Filed 01/08/20 Page 3 of 9 PageID #: 11




5. Defendants Kroger Prescription Plans, Inc. (“KPP”), Kroger Specialty


Pharmacy, Inc. (“KSP”), Accredo Health Group, Inc. (“Accredo”), and Express Scripts, Inc.


(“Express Scripts”) are foreign corporations authorized to d0 business in the State of Indiana.


6. Accredo is a subsidiary of Express Scripts and handles the specialty pharmacy


aspects of Anthem’s pharmacy network.

7. On February 14, 2017, Josephine was born to Jeff and Cari at 23 weeks gestation

and weighing just 1.5 pounds.


8. As a micro-premature baby, Josephine defied all odds. By the time she was nine

months old, Josephine had graduated from most of her specialty physicians, including


neonatologists, pulmonologists, ophthalmologists, hematologists, gastroenterologists, and other


specialists. Her doctors indicated she was on the verge 0f overcoming the health problems

associated with being a micro-premature baby.


9. In August 2017, Josephine’s doctors identified her as a candidate for Synagis, an


antibody used to immunize children against respiratory syncytial virus (RSV). Synagis is a


potentially life-saving medication given once per month during RSV season (November to April).

It highly effective in preventing RSV if given as recommended.

10. Synagis requires prior authorization for use. “Prior authorization” is a health plan


requirement that a prescription drug be authorized for payment by the health plan (insurer) before


the prescription drug is provided to a particular covered individual. Ind. Code § 27—1-37.4—3. A
health plan (insurer) shall accept and respond to a request for “prior authorization” delivered to the


health plan (insurer) by a covered individual’s: (1) prescribing health care provider; or (2)


.” Ind. Code
dispensing pharmacist. .
§ 27-1-37.4-4.
Case 1:20-cv-00069-JPH-MPB Document 1-1 Filed 01/08/20 Page 4 of 9 PageID #: 12




11. Without “prior authorization” from an insurer, a single monthly dose of Synagis

costs more than $3,000.

12. Jeff is employed as a licensed pharmacist at KPP.

13. Jeff and Cari’s primary health insurance coverage for Josephine is through KPP,


and their secondary health insurance coverage for Josephine is through Anthem.

14. KPP generally uses KSP as its specialty pharmacy.


15. Anthem generally uses Accredo as its specialty pharmacy.


16. Express Scripts acts as an agent for Anthem in its role as a Pharmacy Benefit

Manager (PBM).

17. In November 2017, Dr. John Wrasse (“Dr. Wrasse”), Josephine’s pediatrician,


prescribed Synagis to immunize her against RSV, a disease known to have devastating and

possibly deadly consequences for premature babies compromised immune systems. At Dr.


Wrasse’s request, KPP and Anthem authorized Josephine’s Synagis prescription to be filled at


KSP.

18. On November 9, 2017, KSP filled Josephine’s prescription for her first dose of


Synagis.


19. On November 21, 2017, Dr. Wrasse administered Josephine’s first monthly dose of


Synagis.


20. On December 15, 2017, KSP filled Josephine’s prescription for her second dose of

Synagis.


21. On December 20, 2017, Dr. Wrasse administered Josephine’s second monthly dose


of Synagis.
Case 1:20-cv-00069-JPH-MPB Document 1-1 Filed 01/08/20 Page 5 of 9 PageID #: 13




22. On January 18, 2018, ahead of Josephine’s third dose of Synagis, KSP called Jeff


and Cari to inform them Anthem would not authorize KSP to dispense Josephine’s third dose of


Synagis.


23. Over the next several days, Jeff, Cari, and representatives of KSP made numerous

phone calls to Anthem, Express Scripts, and/or Accredo. Each time, representatives of Anthem,

Express Scripts, and Accredo assured Jeff, Cari, and KSP they would authorize KSP to dispense


Josephine’s third dose 0f Synagis. However, the prior authorizations were never provided.


24. Due to Anthem and Express Scripts’ failure to authorize Josephine’s third dose of

Synagis, KSP made additional efforts to obtain prior authorization from Anthem and Express

Scripts. Representatives from KSP explained that KPP, Anthem, and Express Scripts would not


give prior authorization to dispense Josephine’s third dose of Synagis.


25. On or about January 25, 201 8, afier several phone calls with the Defendants

informing them ofthe necessity ofJosephine receiving her third dose of Synagis, Anthem indicated


it had finally provided authorization. However, when KSP attempted to fill the prescription, it



continued to get the same rejection due to Anthem’s failure to provide authorization.


26. On January 26, 201 8, a KSP representative spoke to an Anthem representative and

was informed that Josephine’s third dose of Synagis was “locked out” to Accredo with no overrides

for any other specialty pharmacy to fill the prescription.


27. From January 26, 2018 until the middle of February of 2018, KSP and Accredo

continued to transfer the prescription request for Josephine’s third dose of Synagis back and forth,


with each telling the other it could not fill the prescription. During this time, the effective window

for Josephine to take her third dose of Synagis passed.
Case 1:20-cv-00069-JPH-MPB Document 1-1 Filed 01/08/20 Page 6 of 9 PageID #: 14




28. Anthem claims that Dr. Wrasse’s prescription for Josephine’s third dose of Synagis


was rejected because KSP failed t0 properly submit the secondary claims for Josephine’s third


dose 0f Synagis.


29. KSP denies Anthem’s claim it failed to properly submit the secondary claims for


Josephine’s third dose of Synagis.


30. Anthem, Express Scripts, Accredo, KPP, and KSP are sophisticated health care


entities acutely aware a delay in an insured receiving their medications can have life-threatening


and debilitating effects on their insureds.


3 1. Anthem, Express Scripts, Accredo, KPP, and KSP failed to use reasonable care to


authorize Josephine to receive her third dose of Synagis promptly to properly immunize her against

RSV.

32. On February 21, 2018, Josephine presented to Dr. Wrasse’s office with a fever,


coughing, and wheezing. Dr. Wrasse indicated Josephine was not taking Synagis because Anthem

would not approve KPP’s specialty pharmacy, and Jeffand Cari could not afford Josephine’s shots


without insurance. He noted: “It is a shame that none of her insurance would cover [S]ynagis.


Obviously the insurance companies are more worried about their shareholders than their patients.”

33. On April 11, 2018, Jeff and Cari took Josephine to Dr. Wrasse’s office due t0


respiratory distress. Josephine, as a result of not completing her Synagis immunization, tested


positive for RSV and was immediately referred to IU North Hospital.


34. While hospitalized at IU North Hospital, Josephine went into full respiratory failure


as the RSV attacked her lungs and caused secondary pneumonia.

35. Because of her RSV and secondary pneumonia, Josephine was placed on life



support for 17 days and remained hospitalized until her discharge 0n May 1, 201 8. Josephine was
Case 1:20-cv-00069-JPH-MPB Document 1-1 Filed 01/08/20 Page 7 of 9 PageID #: 15




placed in a medically-induced coma, required countless IV’s, required surgery to place a central


jugular line to administer medication, required mechanical ventilation and continuous positive


airway pressure (CPAP) to assist in breathing, and required a feeding tube.

36. The medical expenses for Josephine’s hospitalization and medical treatment for her

RSV and secondary pneumonia from April 11, 2018 until May 1, 2018, are several hundred


thousand dollars.


37. Following her discharge from Riley Children’s Hospital, Josephine continues to


require medical care and treatment due to the resulting damage to her lungs and respiratory system.

38. The Defendants, and each ofthem, owed the Plaintiffs a duty to use reasonable care


in handling and fulfilling the prescription for Josephine’s third dose of Synagis.


39. The Defendants were negligent and failed to use reasonable care in handling and

fillfilling the prescription for Josephine’s third dose of Synagis.


40. As a direct and proximate result of the Defendants’ negligence in handling and


failing to fulfill the prescription for Josephine’s third dose of Synagis, Josephine developed RSV

in April 2018, suffered a secondary pneumonia, required hospitalization and Indiana University


Hospital and Riley Children’s Hospital, suffered damage to her lungs and respiratory system,

incurred medical expenses, suffered a reduction in her ability to function as a whole person, and


suffered other compensable damages under Indiana law.

41. As a direct and proximate result of the Defendants’ negligence in handling and


failing to fulfill the prescription for Josephine’s third dose of Synagis, Jeff and Cari suffered the

negligent inflection of emotional distress in witnessing their young daughter’s from effects RSV

and the resulting hospitalization, have incurred medical expenses, have incurred unnecessary out-

of-pocket charges, and other compensable damages under Indiana law.
Case 1:20-cv-00069-JPH-MPB Document 1-1 Filed 01/08/20 Page 8 of 9 PageID #: 16




42. The Defendants’ actions constitute gross negligence and a reckless disregard of the

consequences t0 the life 0r property of others that justifies the imposition of punitive damages in


any amount sufficient to punish the Defendants and deter the Defendants and others from like


conduct.


WHEREFORE, the Plaintiffs, Jeffrey Schwartz and Cari Schwartz, Individually and as the


Parents and Natural Guardians of Josephine Schwartz, by counsel, respectfully request they be


awarded damages in an amount sufficient to fully and fairly compensate them for the injuries and

damages proven, for punitive damages, for costs, and for all other relief proper in the premises.


Respectfully submitted,


HOVDE DAssow + DEBTS, LLC



/
Nicholas C. Deets, # 1 7293-53
Tyler J. Zipes, #3508 1 -49
10201 N. Illinois Street, Suite 500
Indianapolis, IN 46260
Telephone: (3 1 7) 8 1 8-3 100
Facsimile: (3 1 7) 81 8—31 11
Attorneysfor Plaintiffs
Case 1:20-cv-00069-JPH-MPB Document 1-1 Filed 01/08/20 Page 9 of 9 PageID #: 17




PLAINTIFFS’ REQUEST FOR TRIAL BY JURY
The Plaintiffs, Jeffrey Schwartz and Cari Schwartz, Individually and as the Parents and

Natural Guardians of Josephine Schwartz, by counsel, respectfully request trial by jury.



Respectfully submitted,
I




HOVDE DAssow + DEBTS, LC



By:
Nicholas c. Deiets, #17293—53
Tyler J. Zipes, #35081 -49
10201 N. Illinois Street, Suite 500
Indianapolis, IN 46260
Telephone: (3 1 7) 8 1 8-3 1 00
Facsimile: (3 1 7) 8 1 8-3 1 11
Attorneysfor Plaintiff



HOVDE DAssow + DEETS, LLC
10201 N. Illinois Street, Suite 500
Indianapolis, IN 46260
Telephone: (317) 818-3100
Facsimile: (317) 818-3 1 11

This document also clearly a caselaw, but the model predicted it to be contracts. It’s good to see that the TLM gives a very low trustworthiness score.

display_result(results_df, index=3)
    TLM predicted category: contracts
TLM trustworthiness score: 0.7897118777453902

 

[exaa_001.jpg] 

 



 

 

 

 [exaa_002.jpg]



 

 

 



 

 [exaa_003.jpg]



 

 

 



 

 [exaa_004.jpg]

 

 

 



 

 [exaa_005.jpg]

 

 

 



 

 [exaa_006.jpg]

 

 

 



 

 [exaa_007.jpg]

 

 

 



 

 [exaa_008.jpg]

 

 

 



 

 [exaa_009.jpg]

 

 

 



 

 [exaa_010.jpg]

 

 

 



 

 [exaa_011.jpg]

 

 

 



 

 [exaa_012.jpg]

 

 

 



 

 [exaa_013.jpg]

 

 

 



 

 [exaa_014.jpg]

 

 

 



 

 [exaa_015.jpg]

 

 

 



 

 [exaa_016.jpg]

 

 

 



 

 [exaa_017.jpg]

 

 

 



 

 [exaa_018.jpg]

 

 

 



 

This document clearly does not belong in any of the three categories as it is just a series of image titles. It makes sense why the TLM gives low trustworthiness score.

How to use Trustworthiness Scores?

If you have time/resources, your team can manually review the LLM classifications of low-trustworthiness responses and provide a better human classification instead. If not, you can determine a trustworthiness threshold below which responses seem too unreliable to use, and have the model abstain from predicting in such cases (i.e. outputting “I don’t know” instead).

The overall magnitude/range of the trustworthiness scores may differ between datasets, so we recommend selecting any thresholds to be application-specific. First consider the relative trustworthiness levels between different data points before considering the overall magnitude of these scores for individual data points.

Measuring Classification Accuracy with Ground Truth Labels

Our example dataset happens to have labels for each document, so we can load them in to assess the accuracy of our model predictions. We’ll study the impact on accuracy as we abstain from making predictions for examples receiving lower trustworthiness scores.

wget -nc 'https://cleanlab-public.s3.amazonaws.com/Datasets/zero_shot_classification_labels.csv'
df_ground_truth = pd.read_csv('zero_shot_classification_labels.csv')
df = pd.merge(results_df, df_ground_truth, on=['index'], how='outer')
df['is_correct'] = df['type'] == df['predicted_category']

df.head()
index text prompt response trustworthiness_score predicted_category type is_correct
0 0 Probl2B\n0/NV Form\nRev. June 2014\n\n\n\n ... You are an expert Legal Document Auditor. Clas... The document is a formal request for early ter... 0.874957 caselaw caselaw True
1 1 UNITED STATES DI... You are an expert Legal Document Auditor. Clas... The document is a court order from a United St... 0.935663 caselaw caselaw True
2 2 \n \n FEDERAL COMMUNICATIONS COMMI... You are an expert Legal Document Auditor. Clas... The document is a Notice of Proposed Rule Maki... 0.938619 legislation legislation True
3 3 \n \n DEPARTMENT OF COMMERCE\n ... You are an expert Legal Document Auditor. Clas... The document is a notice from the National Oce... 0.927012 legislation legislation True
4 4 EXHIBIT 10.14\n\nAMENDMENT NO. 1 TO\n\nCHANGE ... You are an expert Legal Document Auditor. Clas... The document is an amendment to a severance ag... 0.934622 contracts contracts True
print('TLM zero-shot classification accuracy over all documents: ', df['is_correct'].sum() / df.shape[0])
    TLM zero-shot classification accuracy over all documents:  0.9342105263157895

Next suppose we instead abstain from making predictions on 50% of the documents flagged with the lowest trustworthiness scores (e.g. having experts manually categorize these documents instead).

quantile = 0.5  # Play with value to observe the accuracy vs. number of abstained examples tradeoff

filtered_df = df[df['trustworthiness_score'] > df['trustworthiness_score'].quantile(quantile)]
acc = filtered_df['is_correct'].sum() / filtered_df.shape[0]
print(f'TLM zero-shot classification accuracy over the documents within the top-{(1-quantile) * 100}% of trustworthiness scores: {acc}')
    TLM zero-shot classification accuracy over the documents within the top-50.0% of trustworthiness scores: 0.9605263157894737

This shows the benefit of considering the TLM’s trustworthiness score for zero-shot classification over having to rely on results from a standard LLM.