Claude Code for TorchScript Workflow (2026)
Claude Code for TorchScript Workflow Guide
TorchScript is PyTorch’s solution for serializing and optimizing models for production deployment. When combined with Claude Code’s intelligent assistance, you can dramatically accelerate your TorchScript development workflow, from initial model tracing to production debugging. This guide walks you through practical strategies for working with TorchScript using Claude Code.
Understanding TorchScript Basics
Before diving into workflows, it’s essential to understand what TorchScript actually does. TorchScript is a subset of Python that PyTorch can compile into a serializable format, allowing your models to run in environments where Python isn’t available, such as mobile devices, embedded systems, or C++ backends.
TorchScript solves a real problem in ML production pipelines: PyTorch’s eager execution mode is excellent for research and experimentation, but its dependency on Python’s runtime makes it difficult to deploy. TorchScript bridges this gap by compiling your model into an intermediate representation (IR) that can be executed by the LibTorch C++ runtime. This is what enables mobile deployments via PyTorch Mobile and server-side inference without a Python interpreter.
Claude Code can help you understand TorchScript concepts and generate appropriate code. When working on a new TorchScript project, start by explaining your model’s architecture:
Tell Claude: "Help me create a simple CNN model compatible with TorchScript"
import torch
import torch.nn as nn
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
self.conv1 = nn.Conv2d(3, 16, kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(16, 32, kernel_size=3, padding=1)
self.fc = nn.Linear(32 * 8 * 8, 10)
self.pool = nn.MaxPool2d(2, 2)
self.relu = nn.ReLU()
def forward(self, x):
x = self.pool(self.relu(self.conv1(x)))
x = self.pool(self.relu(self.conv2(x)))
x = x.view(-1, 32 * 8 * 8)
x = self.fc(x)
return x
The above model is written in a TorchScript-compatible style from the start: no Python-only constructs, explicit module composition, and a straightforward forward pass with no conditional branches. This is the ideal starting point before you start scripting or tracing.
Tracing vs Scripting: When to Use Each
Claude Code can guide you toward the right TorchScript approach for your use case. There are two primary methods for creating TorchScript models, and choosing incorrectly leads to subtle correctness bugs that are difficult to trace later.
| Feature | torch.jit.trace |
torch.jit.script |
|---|---|---|
| How it works | Records ops on sample inputs | Compiles Python source to IR |
| Dynamic control flow | Not supported (silently ignored) | Fully supported |
| Python data structures | Limited support | Broader support with annotations |
| Error messages | Often opaque | More descriptive |
| Best for | CNNs, fixed-topology models | RNNs, transformers, branchy models |
| Speed | Slightly faster to convert | Slightly more compilation overhead |
@torch.jit.export methods |
Not applicable | Supported for multi-method modules |
The most dangerous thing about tracing is that it silently ignores dynamic control flow. If your forward method contains an if statement that branches based on tensor values, tracing will record only the path taken during the sample run, producing a model that behaves incorrectly on other inputs. Always prefer scripting when your model has any data-dependent logic.
Tracing with torch.jit.trace
Tracing records the execution path through your model with sample inputs. It’s ideal for models with static control flow. Ask Claude to help you with tracing:
Claude-assisted tracing example
model = SimpleCNN()
model.eval()
Create sample input matching expected dimensions
sample_input = torch.randn(1, 3, 32, 32)
Trace the model
traced_model = torch.jit.trace(model, sample_input)
Verify graph before saving
print(traced_model.graph)
Save for deployment
traced_model.save('model_traced.pt')
A good practice when tracing is to immediately print and inspect the generated graph. Claude can help you interpret the IR output and flag potential issues such as graph nodes that suggest your dynamic operations were not captured correctly. You can also use torch.jit.export to verify which methods are accessible in the saved model.
Scripting with torch.jit.script
Scripting compiles the actual source code, supporting dynamic control flow. Use this for models with loops or conditionals:
Claude-assisted scripting example
@torch.jit.script
def process_sequence(features, hidden_size: int):
# Dynamic control flow supported
outputs = []
h = torch.zeros(hidden_size)
for i in range(features.size(0)):
h = torch.tanh(features[i] + h)
outputs.append(h)
return torch.stack(outputs)
When scripting an entire nn.Module, use torch.jit.script(model) rather than the decorator form. This gives you more control over error reporting and lets you keep your model definition clean:
import torch
import torch.nn as nn
from typing import Tuple
class GRUCell(nn.Module):
def __init__(self, input_size: int, hidden_size: int):
super().__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.weight_ih = nn.Linear(input_size, 3 * hidden_size)
self.weight_hh = nn.Linear(hidden_size, 3 * hidden_size)
def forward(self, x: torch.Tensor, h: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
gates_ih = self.weight_ih(x)
gates_hh = self.weight_hh(h)
r_i, z_i, n_i = gates_ih.chunk(3, dim=1)
r_h, z_h, n_h = gates_hh.chunk(3, dim=1)
r = torch.sigmoid(r_i + r_h)
z = torch.sigmoid(z_i + z_h)
n = torch.tanh(n_i + r * n_h)
h_new = (1 - z) * n + z * h
return h_new, h_new
Script the module
cell = GRUCell(128, 256)
scripted_cell = torch.jit.script(cell)
Note the explicit Tuple[torch.Tensor, torch.Tensor] return type annotation. TorchScript requires that all return types are fully annotated when the function signature cannot be inferred from the body alone.
Debugging TorchScript Compilation Errors
One of the most valuable Claude Code use cases is debugging TorchScript compilation errors. These errors often occur because TorchScript has restrictions on Python’s dynamic features. Here’s a practical workflow:
When you hit a compilation error, paste the full error message into your Claude Code session along with the relevant model code. Claude will typically identify the exact line causing the issue and explain which Python feature is unsupported in TorchScript. Common categories include:
- Unsupported Python types:
set,tupleliterals without type annotations,NonewithoutOptional - Dynamic attribute access:
getattr(self, name)wherenameis a runtime string - Python standard library: most
os,sys, and string manipulation functions are unavailable - Class hierarchy issues: scripting a subclass that calls methods defined in a parent not yet scripted
Common Issues and Solutions
Issue 1: Dynamic List Operations
TorchScript requires statically-shaped tensors. When Claude detects this pattern, it will suggest modifications:
Instead of dynamic list appending (problematic in TorchScript)
def forward(self, x):
results = []
for i in range(x.size(0)):
results.append(self.layer(x[i]))
return torch.stack(results)
Claude will suggest using torch.jit.annotate or tensor operations
def forward(self, x):
batch_size = x.size(0)
results = torch.zeros(batch_size, self.output_dim)
for i in range(batch_size):
results[i] = self.layer(x[i])
return results
If you need to append to a list of tensors, you must annotate it explicitly with torch.jit.annotate:
from typing import List
def forward(self, x: torch.Tensor) -> torch.Tensor:
results: List[torch.Tensor] = []
for i in range(x.size(0)):
results.append(self.layer(x[i]))
return torch.stack(results)
Issue 2: Dictionary Comprehensions
Dict comprehensions don’t work in TorchScript. Claude can refactor these:
Problematic pattern
def get_weights(self):
return {name: param for name, param in self.named_parameters()}
Claude-suggested fix
def get_weights(self):
weights = {}
for name, param in self.named_parameters():
weights[name] = param
return weights
Issue 3: Optional Type Annotations
Claude can help add proper type annotations that TorchScript requires:
Without annotations (fails in TorchScript)
def forward(self, x, mask=None):
...
With Claude's suggested annotations
from typing import Optional
def forward(self, x: torch.Tensor, mask: Optional[torch.Tensor] = None) -> torch.Tensor:
if mask is not None:
x = x * mask
return self.layer(x)
Issue 4: Unsupported super() calls
TorchScript has limited super() support. When subclassing scripted modules, Claude will suggest using explicit parent class calls:
Problematic in some TorchScript versions
class MyModel(BaseModel):
def forward(self, x):
x = super().preprocess(x)
return self.head(x)
More reliable pattern
class MyModel(BaseModel):
def forward(self, x):
x = BaseModel.preprocess(self, x)
return self.head(x)
Issue 5: Python Enums and Named Constants
TorchScript does not support Python Enum. Replace them with integer constants and annotate them as int:
Fails in TorchScript
from enum import Enum
class ActivationType(Enum):
RELU = 0
GELU = 1
TorchScript-compatible pattern
ACTIVATION_RELU: int = 0
ACTIVATION_GELU: int = 1
def apply_activation(x: torch.Tensor, act_type: int) -> torch.Tensor:
if act_type == ACTIVATION_RELU:
return torch.relu(x)
else:
return torch.nn.functional.gelu(x)
Optimizing TorchScript Performance
Once your model compiles, Claude can help you optimize it for inference speed. Key strategies include:
Graph Optimization with torch.jit.optimize_for_inference
Claude-assisted optimization
optimized_model = torch.jit.optimize_for_inference(traced_model)
This enables optimizations like:
- Memory planning
- Operator fusion
- Autocast to FP16 if appropriate
You should always benchmark before and after applying optimize_for_inference. The gains vary significantly by model architecture. For small models with many element-wise operations, the speedup can be 20-40%. For large models dominated by matrix multiplication, the impact is often marginal since BLAS routines are already heavily optimized.
A minimal benchmark harness that Claude can help you generate:
import time
import torch
def benchmark_model(model, input_tensor, n_warmup=20, n_runs=100):
model.eval()
# Warmup
with torch.no_grad():
for _ in range(n_warmup):
_ = model(input_tensor)
# Timed runs
start = time.perf_counter()
with torch.no_grad():
for _ in range(n_runs):
_ = model(input_tensor)
end = time.perf_counter()
avg_ms = (end - start) / n_runs * 1000
return avg_ms
sample = torch.randn(1, 3, 224, 224)
base_time = benchmark_model(traced_model, sample)
opt_time = benchmark_model(optimized_model, sample)
print(f"Base model: {base_time:.2f} ms")
print(f"Optimized model: {opt_time:.2f} ms")
print(f"Speedup: {base_time / opt_time:.2f}x")
Fusion Strategies
Ask Claude to analyze your model for fusion opportunities:
Claude can suggest using scripted modules for better fusion
class OptimizedBlock(nn.Module):
def __init__(self):
super().__init__()
# Fused operations for better performance
self.conv_relu = nn.Sequential(
nn.Conv2d(64, 64, 3, padding=1),
nn.ReLU()
)
def forward(self, x):
return self.conv_relu(x)
For GPU inference, PyTorch’s JIT backend can fuse pointwise operations (relu, sigmoid, add) into a single kernel. To encourage fusion, avoid storing intermediate tensors in Python variables unnecessarily, instead chain operations directly in the forward method.
Quantization for Edge Deployment
For mobile and embedded targets, post-training quantization can dramatically reduce model size and improve inference speed. Claude can help you add a quantization wrapper around your TorchScript model:
import torch.quantization
Prepare model for quantization
model = SimpleCNN()
model.eval()
Insert observers
model_prepared = torch.quantization.prepare(model)
Calibrate with representative data
with torch.no_grad():
for batch in calibration_loader:
model_prepared(batch)
Convert to quantized model
model_quantized = torch.quantization.convert(model_prepared)
Now script the quantized model
scripted_quantized = torch.jit.script(model_quantized)
scripted_quantized.save('model_quantized.pt')
Quantized int8 models typically run 2-4x faster than fp32 on CPU and consume 4x less memory, at the cost of a small accuracy drop that is usually under 1% for well-calibrated models.
Integration with Production Pipelines
Claude Code excels at helping you integrate TorchScript models into production workflows. Here’s a practical pattern for model serving:
Model wrapper for production deployment
class ModelServer:
def __init__(self, model_path: str):
self.model = torch.jit.load(model_path)
self.model.eval()
@torch.jit.export
def predict(self, input_data: torch.Tensor) -> torch.Tensor:
with torch.no_grad():
return self.model(input_data)
@torch.jit.export
def predict_batch(self, inputs: List[torch.Tensor]) -> List[torch.Tensor]:
batch = torch.stack(inputs)
outputs = self.predict(batch)
return outputs.unbind(0)
For a real production setup, you will typically want preprocessing and postprocessing baked into the TorchScript model itself. This reduces the chance of training-serving skew and simplifies your serving infrastructure:
from typing import List, Tuple
import torch
import torch.nn as nn
class ProductionWrapper(nn.Module):
def __init__(self, backbone: nn.Module, class_names: List[str]):
super().__init__()
self.backbone = backbone
self.class_names = class_names
# Normalization constants baked in
self.register_buffer('mean', torch.tensor([0.485, 0.456, 0.406]).view(1, 3, 1, 1))
self.register_buffer('std', torch.tensor([0.229, 0.224, 0.225]).view(1, 3, 1, 1))
def preprocess(self, x: torch.Tensor) -> torch.Tensor:
# Expect uint8 [0, 255] input, normalize to float
x = x.float() / 255.0
x = (x - self.mean) / self.std
return x
def forward(self, x: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
x = self.preprocess(x)
logits = self.backbone(x)
probs = torch.softmax(logits, dim=1)
top_prob, top_idx = probs.max(dim=1)
return top_prob, top_idx
@torch.jit.export
def predict_with_names(self, x: torch.Tensor) -> List[str]:
_, indices = self.forward(x)
return [self.class_names[i] for i in indices.tolist()]
Scripting this wrapper captures the preprocessing constants alongside the model weights, so deployment engineers do not need to implement normalization separately in their serving stack.
Serving with LibTorch (C++)
One of the primary motivations for TorchScript is enabling C++ serving. Claude can help you write the C++ loading and inference code:
#include <torch/script.h>
#include <iostream>
#include <memory>
int main() {
// Load the TorchScript model
torch::jit::script::Module module;
try {
module = torch::jit::load("model_traced.pt");
} catch (const c10::Error& e) {
std::cerr << "Error loading model: " << e.what() << std::endl;
return -1;
}
// Create input tensor
std::vector<torch::jit::IValue> inputs;
inputs.push_back(torch::randn({1, 3, 224, 224}));
// Run inference
at::Tensor output = module.forward(inputs).toTensor();
std::cout << output.slice(1, 0, 5) << std::endl;
return 0;
}
Ask Claude to generate the corresponding CMakeLists.txt for your specific LibTorch version and platform. It will correctly configure the find_package(Torch REQUIRED) directives and link targets.
Testing TorchScript Models
Claude can help you create comprehensive tests that work both with eager mode and TorchScript:
def test_model_equivalence():
# Test that traced model produces same outputs as eager mode
model = MyModel()
model.eval()
traced = torch.jit.trace(model, sample_input)
eager_output = model(test_input)
traced_output = traced(test_input)
assert torch.allclose(eager_output, traced_output, atol=1e-5)
# Verify TorchScript-specific properties
assert isinstance(traced, torch.jit.ScriptModule)
assert traced.graph is not None
def test_model_serialization():
import tempfile
import os
model = MyModel()
traced = torch.jit.trace(model, sample_input)
with tempfile.NamedTemporaryFile(suffix='.pt', delete=False) as f:
path = f.name
traced.save(path)
loaded = torch.jit.load(path)
output1 = traced(test_input)
output2 = loaded(test_input)
assert torch.allclose(output1, output2)
os.unlink(path)
Beyond output equivalence, you should also test that your TorchScript model handles edge cases correctly, empty batches, single-element batches, and inputs at the boundaries of your expected input distribution. Claude can help you generate parametrized test suites:
import pytest
@pytest.mark.parametrize("batch_size", [1, 4, 16, 32])
@pytest.mark.parametrize("device", ["cpu", "cuda"] if torch.cuda.is_available() else ["cpu"])
def test_batch_sizes(batch_size, device):
model = MyModel().to(device)
model.eval()
scripted = torch.jit.script(model)
x = torch.randn(batch_size, 3, 224, 224, device=device)
eager_out = model(x)
scripted_out = scripted(x)
assert eager_out.shape == scripted_out.shape
assert torch.allclose(eager_out, scripted_out, atol=1e-5)
def test_no_grad_in_exported_methods():
# Ensure no gradient tape overhead during inference
model = MyModel()
scripted = torch.jit.script(model)
x = torch.randn(1, 3, 224, 224, requires_grad=True)
out = scripted(x)
# Production inference should not require gradients
# This tests that your model doesn't accidentally propagate grad info
assert out.requires_grad == False or not torch.is_grad_enabled()
CI Integration
For continuous integration, configure your test suite to run TorchScript compilation as a separate check. This catches regressions early, if a developer adds a Python-only construct to a model intended for TorchScript, the CI pipeline will flag it before it reaches the deployment stage. Claude can generate GitHub Actions workflow configurations or Makefile targets for this:
.github/workflows/torchscript.yml
name: TorchScript Compilation Check
on: [push, pull_request]
jobs:
torchscript:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: '3.10'
- run: pip install torch torchvision pytest
- run: pytest tests/test_torchscript.py -v
Best Practices Summary
When working with Claude Code for TorchScript development, keep these principles in mind:
-
Start simple: Begin with tracing for straightforward models; graduate to scripting when you need dynamic control flow.
-
Type annotate aggressively: The more type information you provide, the better TorchScript can optimize. Annotate all function signatures, local variables that hold tensors, and any container types.
-
Test incrementally: Compile and test at each stage rather than building a complex model all at once. Add one layer at a time and verify that scripting still works after each addition.
-
Profile after optimization: Use
torch.jit.optimize_for_inferencebut always benchmark before and after with realistic batch sizes and input shapes. -
Keep eager mode as reference: Maintain tests that verify TorchScript outputs match eager mode exactly, using
torch.allclosewith a small absolute tolerance to account for floating-point ordering differences. -
Bake preprocessing into the model: Include normalization and resizing inside your TorchScript module to eliminate training-serving skew and simplify your deployment stack.
-
Use
@torch.jit.exportdeliberately: Only export the methods your serving code actually needs. Unexported methods are removed during optimization, reducing model size and compilation time. -
Inspect the graph: After tracing or scripting, call
model.graphand review the output. Claude can help you interpret unusual nodes or identify missed optimization opportunities.
Claude Code can dramatically accelerate your TorchScript workflow by generating boilerplate, suggesting optimizations, and debugging compilation errors. The key is providing clear context about your model’s architecture and deployment targets.
Try it: Paste your error into our Error Diagnostic for an instant fix.
Related Reading
- AI-Assisted Database Schema Design Workflow
- Automated Code Documentation Workflow with Claude Skills
- Before and After: Switching to Claude Code Workflow
Built by theluckystrike. More at zovo.one
Find the right skill → Browse 155+ skills in our Skill Finder.