MOTION GRADING RUBRIC
General Purpose Assessment Instrument — Version 4.5 New York State Practice — Calibrated to CPLR, DRL, 22 NYCRR, NY RPC
Scope: This rubric is calibrated to New York Supreme Court motion practice. For federal practice, substitute FRCP for CPLR, local rules for 22 NYCRR, and circuit authority for Appellate Division authority. The analytical framework, weight methodology, and grading standards apply regardless of jurisdiction.
Audiences: Practitioner self-assessment, LLM output benchmarking, legal writing education.
© 2026 Joseph M. Fusco, III. All rights reserved.
Currency Note: All statutory, regulatory, and empirical citations verified as of February 2026. The 22 NYCRR is periodically amended; LLM hallucination rates will change as models improve. Users should confirm cited provisions remain current. Report corrections to the rubric maintainer.
#TABLE OF CONTENTS
- How to Use This Rubric
- Weight Summary
- I. Threshold Compliance — Pass/Fail Gateway
- II-A. Problem Diagnosis (10%)
- II-B. Legal Analysis & Authority (20%)
- III. Factual Presentation & Evidentiary Support (18%)
- IV. Writing & Organization (12%)
- V. Strategic Sophistication (10%)
- VI. Ethical Compliance — Pass/Fail Gate with Graduated Deductions
- VII. Self-Assessment & Reflective Practice (10%)
- Grade Modifiers (Negative Only, −13 max)
- Motion Type Calibration Overlays
- LLM Comparison Guide
- Outcome Tracking
- Quick Start Guide (Plain Language)
- Scoring Summary Worksheet
- Regulatory Foundation
- Appendix A: Calibration Set
#HOW TO USE THIS RUBRIC
#For Practitioner Self-Assessment
- Draft your motion. Then grade it section by section using the Scoring Summary Worksheet at the end.
- If any section scores below A− (92), identify the specific deficiency and revise before filing.
- Pay special attention to the Automatic Deductions table — a single fabricated citation (−20) can drop an A to a B.
#For LLM Benchmarking
- Give the model a case scenario with facts, relief sought, and key documents. Ask it to draft a complete motion.
- Grade the output against each section. Use the Scoring Worksheet for side-by-side comparison across models.
- Empirical context: General-purpose LLMs hallucinate 58–88% of legal citations (Dahl et al., "Large Legal Fictions," 2024). Even RAG-based legal tools hallucinate 17–34% (Magesh et al., Stanford RegLab/HAI, 2024). The citation verification step is not optional.
- Map each model to the LLM Performance Tier (Expert/Competent/Deficient) using score ranges and automatic tier demotion rules defined in the LLM Comparison Guide.
Note: When this rubric applies NY RPC standards to LLM outputs, it uses professional responsibility rules as quality benchmarks, not as assertions that AI outputs are subject to attorney disciplinary enforcement.
#For Legal Writing Education
- Assign a motion drafting problem. Distribute the rubric in advance so students know the standard.
- Grade using the same worksheet. The automatic deductions and grade modifiers provide specific, actionable feedback.
- The Section VII self-assessment prompts can be assigned as a separate reflective exercise.
#Audience Weight Adjustments
The baseline weights below apply to all three audiences. However, users may adjust ±5% per section based on context: practitioners may increase Section V (Strategic Sophistication) for pre-filing review; educators may increase Section VII (Self-Assessment) for pedagogical purposes; LLM benchmarkers may increase Section II-B (Legal Analysis) for authority quality testing. Document any adjustments on the Scoring Worksheet.
#WEIGHT SUMMARY
| Section | Weight | Function |
|---|---|---|
| I. Threshold Compliance | Gate | Pass/Fail — failure on any item = automatic deduction of one full letter grade |
| II-A. Problem Diagnosis | 10% | Correct characterization drives all subsequent choices |
| II-B. Legal Analysis & Authority | 20% | Authority hierarchy, statutory interpretation, case law mapping |
| III. Factual Presentation | 18% | Evidentiary support, intellectual honesty about limits |
| IV. Writing & Organization | 12% | Point headings, tone, economy, practical wisdom |
| V. Strategic Sophistication | 10% | Proposed order, appellate posture, multi-motion strategy |
| VI. Ethical Compliance | Gate+ | Pass/Fail gate with graduated deductions (see change note) |
| VII. Self-Assessment | 10% | Filing judgment, adversarial self-review, limitations |
| Modifiers | Negative only (−13 max) | Bare statutory assertion, missing adverse authority, hallucinated citation |
| Technique Register | Non-scored | 32-technique assessment reported alongside composite (see Technique_Register_v4.5.md) |
Methodology note: Weights are calibrated to three factors: (1) sanctions severity under 22 NYCRR § 130-1.1 (higher weight for sections where errors trigger sanctions); (2) judicial decision-priority sequence (courts evaluate threshold compliance before substance, substance before style); (3) discriminative power for LLM benchmarking (sections where human and AI outputs diverge most receive higher weight). Sections II through V and VII total 80%. Section I and VI function as pass/fail gates. Automatic deductions and modifiers account for the remaining scoring range. The deduction values follow the same logic: −20 for hallucinated citations reflects both sanctionability and empirical frequency. This methodology has not been validated through inter-rater reliability testing and invites such testing.
v4.0 structural change — Section VI: Ethical Compliance is restructured from a 10% weighted section to a Pass/Fail gate with graduated deductions. The prior structure scored ethics on a curve, but in practice this section was binary: either zero deductions (A) or catastrophic deductions (failure). The gate structure better reflects reality. The 10% freed is redistributed: +5% to Section II-B (Legal Analysis, now 20% — reflects that authority quality is where motions succeed or fail) and +5% retained as additional modifier range (±4 → ±6; simplified to negative-only −13 max in v4.4).
#I. THRESHOLD COMPLIANCE — Pass/Fail Gateway
Failure on any item = automatic deduction of one full letter grade.
| # | Item | Authority / Standard |
|---|---|---|
| 1 | Proper court and caption | CPLR §2101 |
| 2 | Notice of Motion with return date, relief sought, grounds | CPLR §2214(a) |
| 3 | Affirmation/Affidavit with personal knowledge | CPLR §2106 (attorneys); §2101 (parties) |
| 4 | Memorandum of Law (separate or combined per local rules) | 22 NYCRR §202.8-a; Commercial Division Rule 17 |
| 5 | Proof of Service compliant with method and timing | CPLR §2103 |
| 6 | Compliance with Individual Part Rules of assigned justice | 22 NYCRR §202.1 et seq. |
| 7 | Word count / page limits if applicable | 22 NYCRR §202.8-b; Commercial Division Rule 17 |
| 8 | All citations verified as real | NY RPC 3.3(a)(1); Mata v. Avianca (S.D.N.Y. 2023) |
| 9 | Proposed Order (CPLR §2219 recital, ready for signature) | CPLR §2219(a); local practice |
LLM Note: Models frequently omit the proposed order, the certification, and Individual Part Rules compliance. In the Stanford RegLab studies (Dahl et al. 2024; Magesh et al. 2024), threshold procedural elements were among the most commonly absent components in LLM-generated legal outputs, revealing whether training reflects actual NY practice or only treatise law.
#II-A. PROBLEM DIAGNOSIS (10%)
Correctly identifies whether the matter presents a pure question of law, a factual dispute requiring development, or a mixed question requiring careful framing. The characterization drives every subsequent analytical choice.
| Score | Criteria |
|---|---|
| A+ (98–100) | Diagnosis is precise, identifies the exact procedural vehicle and explains why alternatives were rejected. If the user's framing suggests a stronger vehicle, says so. Demonstrates command of the relationship between diagnosis and downstream strategy. |
| A (95–97) | Correct diagnosis with appropriate vehicle selection. Minor gap: does not address alternative vehicles or does not explain why the chosen vehicle is strongest. |
| A− (92–94) | Correct in substance but imprecise in framing. Vehicle is appropriate but not optimal. |
| B+ (88–91) | Correct standard identified but analysis lacks depth on one material element of the diagnosis. |
| B (83–87) | Adequate diagnosis; relies on surface-level characterization without engaging the procedural nuances. |
| B− (80–82) | Correct in broad strokes but contains a meaningful diagnostic error. |
| C+ (77–79) | Misidentifies the applicable standard or the nature of the dispute; recoverable but damaging. |
| C or below | Fundamental mischaracterization that drives the wrong vehicle, wrong standard, or wrong relief. |
LLM Note: Models often default to the most common motion type (e.g., summary judgment) without evaluating whether the facts support it. Test whether the model considers alternative vehicles.
#II-B. LEGAL ANALYSIS & AUTHORITY (20%)
Identifies and applies controlling authority with precision. Demonstrates statutory interpretation methodology, analogical reasoning as an affirmative skill, and engagement with policy rationale. Distinguishes adverse precedent.
| Score | Criteria |
|---|---|
| A+ (98–100) | All five authority hierarchy categories filled and deployed. Statutory interpretation methodology explicit. Analogical reasoning demonstrated as affirmative skill. Adverse authority addressed with specific distinction. Policy rationale engaged. No known gap in controlling authority. |
| A (95–97) | Strong command across all categories; minor gap in one category (e.g., administrative/practice sources not located, or policy rationale not engaged). |
| A− (92–94) | Strong analysis with one meaningful gap: either missing an authority category, or adverse authority addressed but not fully distinguished. |
| B+ (88–91) | Correct standard applied with adequate authority, but analysis lacks depth on one material element. May rely on fewer than four authority categories. |
| B (83–87) | Adequate statement of law; relies too heavily on secondary sources or hornbook recitation without applying authority to facts. |
| B− (80–82) | Correct in broad strokes but contains a meaningful analytical error or omission. |
| C+ (77–79) | Misidentifies the applicable standard or burden of proof; recoverable but damaging. |
| C or below | Fundamental misunderstanding of the governing law; would likely result in denial. |
#Authority Hierarchy Checklist (Required)
For each distinct legal argument, the motion should fill all five categories:
| Category | Content Required | Strength Rating |
|---|---|---|
| (a) Statutory text | Governing statute with operative language quoted | Mandatory |
| (b) Binding authority | Court of Appeals or controlling Appellate Division case on analogous facts | Controlling / Supportive |
| (c) Persuasive authority | Other departments, federal courts, treatises, trial-level decisions | Supportive / Background |
| (d) Administrative/practice | OCA forms, Uniform Rules, administrative orders, practice guides | Supportive / Background |
| (e) Adverse authority | Strongest contrary case with planned distinction | Mandatory |
Category (d) is the competitive edge. OCA forms, Uniform Rules, and administrative orders are what opposing counsel and other AI tools miss. Category (e) is mandatory under NY RPC 3.3(a)(2): counsel must disclose directly adverse controlling authority not disclosed by opposing counsel.
#Specific Evaluation Criteria
- Correct identification of burden of proof and which party bears it at each stage.
- Proper use of Appellate Division, Fourth Department authority (and awareness of departmental splits).
- Accurate citation to CPLR provisions and procedural prerequisites.
- Statutory interpretation methodology: text → legislative history → policy purpose (Gluck).
- Analogical reasoning as an affirmative skill distinct from distinguishing adverse authority (Sherwin).
- Policy rationale engagement: why the statute exists, not just what it says (Gluck).
- Anticipation and preemptive rebuttal of opposing arguments.
LLM Note: Models frequently cite correct statutes but apply them mechanically without interpretive methodology. The bare assertion "the statute provides" without case law showing how courts apply it is the #1 LLM failure pattern in this section (−5 automatic deduction).
#III. FACTUAL PRESENTATION & EVIDENTIARY SUPPORT (18%)
Every material factual assertion supported by admissible evidence in proper form. Demonstrates intellectual honesty about the limits of available evidence.
| Score | Criteria |
|---|---|
| A+ (98–100) | Every assertion mapped to a specific exhibit with pinpoint cite. Affidavits based on personal knowledge. Evidence authenticated. Intellectual honesty about evidentiary limits without conceding the position. Exhibit list complete and cross-referenced. |
| A (95–97) | Strong evidentiary support with one minor gap: e.g., one assertion without pinpoint cite, or one exhibit referenced but not on exhibit list. |
| A− (92–94) | Adequate support; minor evidentiary deficiencies (e.g., hearsay in supporting affidavit without exception identified). |
| B+ (88–91) | Most facts supported but one or two material assertions rely on characterization rather than evidence. |
| B (83–87) | Adequate factual recitation but exhibits are referenced generally rather than with specificity. |
| B− (80–82) | Factual presentation contains gaps that would be exploited on opposition. |
| C+ (77–79) | Material facts stated without evidentiary support; relies on counsel's assertions as proof. |
| C or below | Factual foundation is insufficient to support the relief sought. |
#Specific Evaluation Criteria
- Personal knowledge requirement satisfied (CPLR §2106 for attorney affirmation; §2101 for party affidavit).
- Business records foundation laid where required (CPLR §4518).
- Exhibits authenticated and referenced with specificity (exhibit letter/number + page/paragraph).
- Intellectual honesty about evidentiary limits (Sherwin): acknowledges what the record does and does not show without conceding the legal position.
- Exhibit list complete: every exhibit referenced in the affirmation appears on the list; every item on the list is referenced in the affirmation.
LLM Note: Models tend to state facts confidently without tying them to specific exhibits. If the model produces an affirmation that says "Defendant earned $X" without citing the exhibit that proves it, the factual presentation is deficient regardless of whether the assertion is true.
#IV. WRITING & ORGANIZATION (12%)
Demonstrates practical wisdom (Kronman): the integration of analytical skill and sound judgment expressed through clear, economical writing. Calibrated to the institutional context (Wizner): the motion reads as if written for this court, this judge, this procedural posture.
| Score | Criteria |
|---|---|
| A+ (98–100) | Point headings are argumentative and substantive. Tone is respectful but devastating through evidence. Writing is economical — the Notice of Motion alone communicates the core argument. Structure follows statute → binding → application consistently. Audience-calibrated to the assigned justice. |
| A (95–97) | Strong writing with clear structure. One minor weakness: e.g., one descriptive (not argumentative) point heading, or slightly verbose in one section. |
| A− (92–94) | Competent writing; organization is logical but not optimized. May follow chronology rather than argument. |
| B+ (88–91) | Adequate organization; writing is clear but lacks economy or argumentative force in point headings. |
| B (83–87) | Functional writing; disorganized in places or relies on boilerplate language. |
| B− (80–82) | Writing undermines otherwise adequate substance through poor organization or inappropriate tone. |
| C+ (77–79) | Disorganized or inflammatory tone that would prejudice the court. |
| C or below | Writing is unclear, argumentative in tone rather than substance, or fundamentally disorganized. |
#Specific Evaluation Criteria
- Point headings are argumentative and substantive (see Commercial Division Rule 17; cf. 22 NYCRR §202.8-a).
- Tone: forceful without being inflammatory; respects the court and opposing counsel.
- Economy: no unnecessary recitation; every paragraph advances the argument.
- Structure: statute before case law; binding before persuasive; application before rebuttal.
- Audience awareness: calibrated to the institutional context — the assigned justice's known preferences, the court's local practice, the procedural posture (Wizner).
- Practical wisdom: integrates analytical skill and sound judgment — knows what to emphasize and what to omit (Kronman).
#V. STRATEGIC SOPHISTICATION (10%)
Considers how the motion fits within the broader litigation posture. Creates a favorable record for appellate review. Demonstrates command of relevant non-legal domain knowledge. Considers multi-motion strategy and timing.
| Score | Criteria |
|---|---|
| A+ (98–100) | Proposed order is ready for signature with compliance deadlines. Motion creates a clean appellate record. Multi-motion strategy is evident: timing, sequencing, and interaction with other pending motions strengthens the application. Demonstrates interdisciplinary command where relevant. |
| A (95–97) | Strong strategic awareness with minor gap: e.g., proposed order needs minor revision, or multi-motion context not addressed. |
| A− (92–94) | Adequate strategy; proposed order present but imprecise; appellate posture considered but not optimized. |
| B+ (88–91) | Motion addresses the immediate issue but does not consider broader litigation posture. |
| B (83–87) | Adequate motion with limited strategic awareness; proposed order absent or generic. |
| B− (80–82) | Motion may succeed on the narrow issue but creates problems for future litigation posture. |
| C+ (77–79) | Strategic unawareness: motion undermines the client's broader position. |
| C or below | Motion is strategically counterproductive; filing judgment should have prevented it. |
#Specific Evaluation Criteria
- Proposed order quality (Clermont): recites papers per CPLR §2219; each ORDERED paragraph tracks one relief item; service provision; compliance deadline; ready for signature without modification.
- Appellate posture awareness (Clermont): creates a clean record; preserves arguments; doesn't concede unnecessarily.
- Multi-motion strategy (NEW in v4.0): if the motion is part of a documented sequence where timing, sequencing, or interaction with other pending motions strengthens the application, this should be evident. Strategic timing is a skill that distinguishes human-quality work from LLM output.
- Interdisciplinary command (Rakoff): demonstrates relevant non-legal domain knowledge (financial analysis, medical records, technology) where applicable.
- Access-to-justice calibration (Wizner): considers the impact on unrepresented or under-resourced parties.
#VI. ETHICAL COMPLIANCE — Pass/Fail Gate with Graduated Deductions
v4.0 structural change: Ethical Compliance is restructured from a 10% weighted section to a Pass/Fail gate with graduated deductions. The prior structure scored ethics on a gradient, but violations are binary in practice: either zero (scoring A every time) or catastrophic. The freed 10% is redistributed: +5% to Section II-B (now 20%) and +5% to modifier range (±6 at introduction; simplified to negative-only −13 max in v4.4).
#Automatic Deductions Table
| Violation | Deduction | Authority |
|---|---|---|
| Fabricated/hallucinated citation (case does not exist) | −20 | NY RPC 3.3; 22 NYCRR §130-1.1 |
| Misquoted or fabricated case holding | −15 | NY RPC 3.3(a)(1) |
| Failure to disclose adverse controlling authority | −15 | NY RPC 3.3(a)(2) |
| False statement of material fact | −15 | NY RPC 3.3(a)(1); 22 NYCRR §130-1.1 |
| Problem mischaracterization leading to wrong vehicle | −10 | Rakoff; CPLR §3211 vs. §3212 distinction |
| Failure to cite controlling authority | −10 | ABA Standards; NY RPC 1.1 |
| Ad hominem attack on opposing counsel or party | −10 | 22 NYCRR §130-1.1(c) |
| Trial-level decision cited as binding authority | −5 | NY court hierarchy |
| Bare statutory assertion without interpretive methodology | −5 | Gluck; statutory interpretation standards |
| Word count / page limit violation | −5 | 22 NYCRR §202.8-b |
| Missing certification of citations | −5 | 22 NYCRR §130-1.1-a |
| String citation without parenthetical explanation | −3 | Professional standards; Clermont |
LLM Note: Hallucinated citations are the most disqualifying LLM failure. General-purpose LLMs hallucinate 58–88% of verifiable legal citations (Dahl et al. 2024); even specialized tools hallucinate 17–34% (Magesh et al. 2024). One fabricated citation receives −20 and is sanctionable in practice (Mata v. Avianca, S.D.N.Y. 2023). For LLM benchmarking, a single hallucinated citation also triggers automatic tier demotion to Deficient regardless of composite score.
#VII. SELF-ASSESSMENT & REFLECTIVE PRACTICE (10%)
The drafter demonstrates the capacity to evaluate their own work critically, identify weaknesses, and articulate the strategic judgments underlying their choices.
| Score | Criteria |
|---|---|
| A+ (98–100) | Identifies specific weaknesses with proposed fixes. Engages the strongest counter-argument honestly. Articulates filing judgment. Considers impact on vulnerable parties. Assesses appellate resilience. |
| A (95–97) | Strong self-assessment with one gap: e.g., identifies weaknesses but does not propose fixes, or does not consider appellate resilience. |
| A− (92–94) | Adequate self-assessment; identifies most issues but analysis is surface-level. |
| B+ (88–91) | Acknowledges limitations but does not engage the strongest counter-argument. |
| B (83–87) | Minimal self-assessment; states the motion is "strong" without identifying specific vulnerabilities. |
| C+ or below | No meaningful self-assessment; unable to identify weaknesses or engage counter-arguments. |
#Self-Assessment Prompts (answer each after drafting)
- What is the single strongest ground on which the court could deny this motion? How have you addressed it?
- What is opposing counsel's best counter-argument? Where in the motion is it preempted?
- Should this motion have been filed at all? (Kronman — filing judgment.) What alternative strategies were considered and rejected?
- How does this motion affect unrepresented or under-resourced parties? (Wizner — impact on vulnerable parties.)
- If the motion is denied, what arguments are preserved for appeal? What has been conceded? (Clermont — appellate resilience.)
LLM Note: Self-assessment is where human practitioners and LLM outputs diverge most dramatically. Most models will not volunteer weaknesses in their own output unless specifically prompted. For LLM benchmarking, ask the model to self-assess after drafting and grade the response against these prompts. A model that identifies real weaknesses in its own draft scores higher than one that declares its work "comprehensive and well-supported."
#GRADE MODIFIERS (Negative Only, −13 max)
v4.4 structural change: Positive modifiers have been replaced by the Technique Register (see Technique_Register_v4.5.md). The Technique Register is reported alongside the composite score but does not modify it. Qualities previously rewarded as positive modifiers (creative reframing, interdisciplinary command, strategic context, access-to-justice awareness, judicial path dependence) are now assessed through the 32-technique taxonomy.
#Negative Modifiers (−13 max)
| Modifier | Deduction |
|---|---|
| Bare statutory assertion (no case law) | −1 each |
| Missing adverse authority | −2 |
| Hallucinated citation | −10, cap at Deficient |
Maximum negative modifier exposure: −13.
Score cap: A+ (98) is the maximum composite score. No motion scores above 98.
#MOTION TYPE CALIBRATION OVERLAYS (NEW in v4.0)
The base rubric applies to all motion types. The overlays below adjust weights and add requirements for common motion types. Document the applicable overlay on the Scoring Worksheet.
| Motion Type | Weight Adjustments | Required Framework | Additional Threshold Items |
|---|---|---|---|
| Summary Judgment (CPLR §3212) | Sec. III →23%; Sec. IV →7% | Alvarez v. Prospect Hosp.; Zuckerman v. City of NY | Statement of Material Facts (§202.8-g if applicable) |
| Dismissal (CPLR §3211) | Sec. II-A →15%; Sec. III →13% | Leon v. Martinez; EBCI (§3211(a)(7)) | Identify specific §3211(a) subdivision |
| Contempt (Jud. Law §753) | Sec. III →23%; Sec. IV →7% | Clear mandate + willful disobedience; civil vs. criminal distinction | Certified copy of violated order |
| Fee Application (DRL §237) | Base weights apply | O'Shea / Frankel / DeCabrera; 2010 presumption | Arithmetic financial disparity (not characterization) |
| Discovery (§3124/§3126) | Base weights apply | Proportionality; Allen v. Crowell-Weedon | Good-faith conferral affirmation (§202.7) |
#LLM COMPARISON GUIDE
#Performance Tiers (anchored to composite scores)
| Tier | Score Range | Observable Indicators |
|---|---|---|
| Expert | 90+ | All five authority categories deployed. Adverse authority addressed. Proposed order ready for signature. Self-assessment identifies real weaknesses. No hallucinated citations. |
| Competent | 80–89 | Correct standard and vehicle. At least three authority categories. Some gaps in adverse authority or proposed order. Limited self-assessment. |
| Deficient | Below 80 | Wrong vehicle, missing authority categories, no adverse authority, boilerplate proposed order, no self-assessment. May contain hallucinated citations. |
#Automatic Tier Demotion Rules
- Any hallucinated citation → Deficient, regardless of composite score.
- Wrong procedural vehicle → cannot score above Competent.
- Missing Memorandum of Law → cannot score above Competent.
- No adverse authority addressed → cannot score above Competent.
#Critical LLM Failure Modes
- HALLUCINATED CITATIONS: The most disqualifying failure. Verify every case cited. One fabricated citation = −20 and automatic Deficient tier. Empirical rates: 58–88% for general-purpose LLMs, 17–34% for specialized legal AI (Dahl et al. 2024; Magesh et al. 2024).
- JURISDICTION CONFUSION: Mixing federal and state procedure, citing wrong department's authority as binding, applying FRCP standards in a CPLR motion.
- ADVOCACY DEFICIT: Producing neutral legal memoranda when the task is persuasive motion practice. The motion should argue, not just inform.
- METACOGNITIVE ABSENCE: Inability to self-assess. Most models will not identify weaknesses in their own output unless specifically prompted.
- MULTI-MOTION BLINDNESS (NEW in v4.0): Treating motions in isolation without considering litigation sequencing, timing, or interaction with other pending motions. This is the largest gap between human-quality strategic thinking and LLM output.
- JUDICIAL PATH DEPENDENCE (NEW in v4.0): When a court has signed a series of proposed orders from one side, each order becomes the baseline for the next. The judge is asked to be consistent with "the court's prior order" — except the court didn't draft it. Aggressive counsel exploit this by flooding the court with proposed orders that each move the baseline incrementally, creating a cumulative effect no single order would justify. Motions that fail to account for this dynamic — either defensively (breaking the cycle by forcing the court to reassess the full record) or offensively (drafting proposed orders that establish favorable baselines for future motions) — are strategically incomplete. Defensive indicators: motion explicitly recites the cumulative effect of prior orders, asks the court to take a "snapshot" of the full financial posture, or identifies inconsistencies across the order history. Offensive indicators: proposed order contains language that creates precedent for the next motion, relief items are sequenced to build on each other, and compliance deadlines create leverage for future applications.
- TECHNIQUE REGISTER FAILURE MODES (NEW in v4.5):
- (a) Editorial Gloss on Negative Facts: LLMs add characterizing language to factual absences. Near-universal. Primary discriminator between expert and competent output.
- (b) Relief Escalation: LLMs calibrate relief up to match evidentiary strength. The opposite of expert practice. Stronger facts should produce more modest requests.
- (c) Flat Invisibility: LLMs produce Drafter Invisibility by default (no ego to suppress). Reads as competent but lacks controlled tension. Fix: add structural markers of restraint — facts that would justify anger, presented without anger.
- (d) Technique Shoehorning: LLMs force techniques into contexts where they don't belong to maximize the Technique Register count. The non-scored Register structure reduces this incentive but does not eliminate it.
#OUTCOME TRACKING (NEW in v4.0)
For filed motions, record the outcome on the Scoring Worksheet: Granted, Granted in Part (specify which items), Denied (record court's stated reason and map to rubric section), Withdrawn, or Settled. Target correlation: motions scoring A (95+) should be granted or prompt settlement 80%+ of the time. Divergence from this target indicates the rubric weights should be recalibrated. This hypothesis is untested and requires 50–100 graded motions with outcomes to validate.
#QUICK START GUIDE (Plain Language)
This page explains the rubric for users who are not attorneys.
| Section | What It Checks | Analogy |
|---|---|---|
| I. Threshold | All required paperwork included? | Like a tax return — missing a form = rejected |
| II-A. Diagnosis | Correct type of legal problem identified? | Wrong diagnosis = wrong treatment |
| II-B. Analysis | Right laws and cases supporting your request? | Five categories: statute, binding, persuasive, administrative, adverse |
| III. Factual | Facts proven with specific documents? | Every claim needs a receipt |
| IV. Writing | Clear, organized, respectful tone? | Persuade through evidence, not emotion |
| V. Strategic | Considers the bigger picture? | Chess, not checkers |
| VII. Self-Assessment | Drafter can identify own weaknesses? | Honest mirror, not a cheerleader |
#Key Terms
| Term | Definition |
|---|---|
| Binding authority | A higher court's decision that the judge must follow (Court of Appeals, Appellate Division). |
| Adverse authority | The strongest legal argument against your position. You must address it — ignoring it is an ethical violation. |
| Proposed order | A draft of the decision you want the judge to sign, ready for their signature. |
| Hallucinated citation | A case reference that does not exist. AI tools frequently invent these. Every citation must be verified. |
#SCORING SUMMARY WORKSHEET
Complete one worksheet per motion. For LLM benchmarking, complete one per model.
| Section | Weight | Grade | Points | Notes / Flags |
|---|---|---|---|---|
| I. Threshold | Gate | |||
| II-A. Diagnosis | 10% | /10 | ||
| II-B. Analysis | 20% | /20 | ||
| III. Factual | 18% | /18 | ||
| IV. Writing | 12% | /12 | ||
| V. Strategic | 10% | /10 | ||
| VI. Ethical Gate | Gate | Total deductions: ___ | ||
| VII. Self-Assessment | 10% | /10 | ||
| Subtotal (Sections II–V, VII) | 80% | /80 | ||
| Ethical Deductions | Subtract from subtotal | |||
| Negative Modifiers (−13 max) | List modifiers applied | |||
| COMPOSITE SCORE | /80 adj. | |||
| Technique Register | Non-scored | __/32 | See Technique_Register_v4.5.md |
#Additional Worksheet Fields
| Field | Entry |
|---|---|
| Motion Type | |
| Calibration Overlay Applied | |
| Weight Adjustments (if any) | |
| Model / Drafter | |
| Date Graded | |
| Grader | |
| LLM Tier (if applicable) | Expert / Competent / Deficient |
| Outcome (if filed) | Granted / Granted in Part / Denied / Withdrawn / Settled |
| Court's Stated Reason (if denied) | |
| Rubric Section Mapped to Denial |
#Grade Scale
| A+ | A | A− | B+ | B | B− |
|---|---|---|---|---|---|
| 98 | 95–97 | 92–94 | 88–91 | 83–87 | 80–82 |
A+ cap: The maximum composite score is 98. The remaining 2 points represent the irreducible gap between even the best motion and perfection.
#REGULATORY FOUNDATION
Every criterion in this rubric traces to one or more of the following authorities:
Statutory/Regulatory:
- CPLR §§2101, 2103, 2106, 2214, 2219, 3120, 3124, 3126, 3211, 3212, 4518 — procedural requirements, motion practice, evidentiary standards.
- DRL §§234, 237 — matrimonial motion practice, fee applications.
- 22 NYCRR §§130-1.1, 130-1.1-a, 202.1 et seq., 202.7, 202.8-a, 202.8-b, 202.16(k)(3) — sanctions, certification, motion procedures, financial disclosure.
- NY RPC Rules 1.1 (competence), 3.1 (meritorious claims), 3.3 (candor toward tribunal), 3.4 (fairness to opposing party) — professional responsibility standards.
- ABA Standards for Imposing Lawyer Sanctions (1986, as amended) — framework for measuring professional deficiency.
- Commercial Division Rules 17, 19-a — briefing standards, statement of material facts.
Scholarly Sources:
- Minow & Rakoff, "The Case for Another Case Method," 60 Vand. L. Rev. 597 (2007) — problem diagnosis methodology.
- Abbe R. Gluck, statutory interpretation methodology and legislative purpose engagement.
- Emily Sherwin, analogical reasoning as an affirmative legal skill.
- Anthony Kronman, practical wisdom and filing judgment.
- Stephen Wizner, institutional context calibration and access-to-justice awareness.
- Kevin Clermont, proposed order quality and appellate posture awareness.
- Bryan A. Garner / Borman (adverse authority) — writing quality and citation standards.
Empirical LLM Research:
- Dahl et al., "Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models," Journal of Legal Analysis (2024) — 58–88% hallucination rates across GPT-4, GPT-3.5, PaLM 2, Llama 2.
- Magesh et al., "Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools," Stanford RegLab/HAI (2024) — 17–34% hallucination rates in RAG-based legal AI tools.
- Mata v. Avianca, Inc., No. 22-cv-1461 (S.D.N.Y. 2023) — landmark sanctions case for AI-generated fabricated citations.
Version History: v1.0 (original Cornell framework) → v2.0 (panel review integration) → v3.0 (LLM benchmarking, three-audience framing) → v3.1 (citation audit corrections) → v3.2 (audience weight note, three-problem diagnosis) → v3.3 (empirical sourcing, weight methodology, Marx correction) → v4.0 (ethical gate restructure, motion-type overlays, LLM tier anchoring, A-band split, outcome tracking, strategic context modifier, exhibit list requirement, Quick Start Guide, calibration set placeholder) → v4.1 (calibration set integration, cross-reference alignment) → v4.2 (Technique Register introduced: Weaponized Absence, Reader-Drawn Conclusions, Cumulative Absence as Theme) → v4.3 (Relief Underload technique, Relief Escalation failure mode) → v4.4 (Techniques 5-10, positive modifiers removed from composite, Technique Register formalized) → v4.5 (Techniques 11-20, category distribution, A+ cap at 98, negative modifiers simplified to −13 max).
#APPENDIX A: CALIBRATION SET (Forthcoming)
Three annotated exemplar motions demonstrating rubric discrimination across score bands, all based on identical underlying facts:
| Band | Score | Description |
|---|---|---|
| A | 95+ | Full motion drafted with System Prompt v3.0 + rubric feedback loop. All five authority categories filled. Adverse authority addressed. Proposed order ready. Self-assessment identifies real weaknesses. Annotated with rubric scores per section. |
| B | 83–87 | Same facts given to a general-purpose LLM without the system prompt or rubric. Expected gaps: missing authority categories, boilerplate proposed order, no adverse authority, limited self-assessment. Annotated with rubric scores showing where and why points were lost. |
| C | Below 78 | Deliberately flawed motion: wrong vehicle, hallucinated citation, no proposed order, bare statutory assertions, no self-assessment. Annotated to show how each deduction trigger activates and how the rubric catches each failure. |
The calibration set serves three purposes: (1) demonstrates the rubric discriminates between quality levels on identical facts; (2) provides a training dataset for inter-rater reliability testing; (3) creates a publishable case study for the white paper. All personal details will be redacted before publication.
The Technique Register (see Technique_Register_v4.5.md) is reported alongside each exemplar's composite score. Technique Register assessments for the calibration set exemplars will be added when the Register is applied retroactively.