axios+telemetry cleanup
This commit is contained in:
125
docs/AUTH_GUIDE.md
Normal file
125
docs/AUTH_GUIDE.md
Normal file
@@ -0,0 +1,125 @@
|
||||
# Authentication Guide - Claude Code
|
||||
|
||||
This guide provides an overview of the various authentication methods supported by the Claude Code CLI, along with configuration steps and troubleshooting tips.
|
||||
|
||||
---
|
||||
|
||||
## 1st Party Anthropic Authentication
|
||||
|
||||
Claude Code primarily connects directly to the Anthropic API. There are three main ways to authenticate:
|
||||
|
||||
### Direct API Key
|
||||
The most common method for individual developers.
|
||||
- **Environment Variable**: `ANTHROPIC_API_KEY`
|
||||
- **Setup**: Export your key in your shell profile (e.g., `.zshrc` or `.bashrc`).
|
||||
```bash
|
||||
export ANTHROPIC_API_KEY='sk-ant-api03-...'
|
||||
```
|
||||
- **Security Note**: This method is prioritized in CI and non-interactive environments.
|
||||
|
||||
### Claude.ai OAuth (Subscriber Mode)
|
||||
If you have a Claude Pro or Team subscription, you can log in using your Claude.ai account.
|
||||
- **Command**: Run `/login` in the CLI.
|
||||
- **How it works**: This opens a browser for OAuth authentication. Once completed, your session is managed via a local secure token.
|
||||
- **Internal Users**: Internal Anthropic employees use a specialized version of this flow.
|
||||
|
||||
### External Key Helpers
|
||||
For teams using a secret manager (like 1Password CLI or AWS Secrets Manager), you can use a helper script.
|
||||
- **Setting**: `apiKeyHelper` in your `~/.claude/settings.json`.
|
||||
- **Example**:
|
||||
```json
|
||||
{ "apiKeyHelper": "op read 'op://private/Anthropic/api-key'" }
|
||||
```
|
||||
- **Behavior**: The CLI will execute this command to retrieve the key on startup.
|
||||
|
||||
---
|
||||
|
||||
## Security & Workspace Trust
|
||||
|
||||
Claude Code implements a "Trust Dialog" to protect you from malicious repository settings.
|
||||
|
||||
### Custom Scripts
|
||||
Settings that execute arbitrary code (like `apiKeyHelper`, `awsAuthRefresh`, or `awsCredentialExport`) are subject to the following rules:
|
||||
- **Global Settings**: Always trusted (stored in `~/.claude/settings.json`).
|
||||
- **Project Settings**: Only executed if you have explicitly "trusted" the workspace.
|
||||
- **Dialog**: If a project-local script is detected, Claude Code will prompt you for approval before execution.
|
||||
|
||||
> [!WARNING]
|
||||
> Never trust a workspace from an untrusted source, as it could use these helpers to exfiltrate your API keys or run malicious commands on your behalf.
|
||||
|
||||
---
|
||||
|
||||
## 3rd Party Cloud Providers
|
||||
|
||||
Claude Code supports using models hosted on major cloud platforms. To use these, you must enable the specific provider via environment variables.
|
||||
|
||||
### AWS Bedrock
|
||||
- **Enable**: Set `CLAUDE_CODE_USE_BEDROCK=true`.
|
||||
- **Authentication**: Uses standard AWS SDK credentials (IAM Roles, `~/.aws/credentials`, or `AWS_ACCESS_KEY_ID`).
|
||||
- **Region**: Defaults to `us-east-1`. Override with `AWS_REGION`.
|
||||
- **Custom Auth**: Supports `awsAuthRefresh` and `awsCredentialExport` settings for specialized SSO flows.
|
||||
|
||||
### GCP Vertex AI
|
||||
- **Enable**: Set `CLAUDE_CODE_USE_VERTEX=true`.
|
||||
- **Authentication**: Uses Application Default Credentials (ADC) via `google-auth-library`.
|
||||
- **Configuration**:
|
||||
- `ANTHROPIC_VERTEX_PROJECT_ID`: (Required) Your GCP project ID.
|
||||
- `CLOUD_ML_REGION`: (Optional) Your GCP region.
|
||||
- **Auth Refresh**: Supports `refreshGcpCredentialsIfNeeded` logic for long-running sessions.
|
||||
|
||||
### Azure Foundry
|
||||
- **Enable**: Set `CLAUDE_CODE_USE_FOUNDRY=true`.
|
||||
- **Authentication**:
|
||||
- Uses `ANTHROPIC_FOUNDRY_API_KEY` if provided.
|
||||
- Otherwise, falls back to `DefaultAzureCredential` (Azure AD).
|
||||
- **Endpoint**: Configure via `ANTHROPIC_FOUNDRY_RESOURCE` or `ANTHROPIC_FOUNDRY_BASE_URL`.
|
||||
|
||||
---
|
||||
|
||||
## Environment Variable Reference
|
||||
|
||||
| Variable | Method | Description |
|
||||
| :--- | :--- | :--- |
|
||||
| `ANTHROPIC_API_KEY` | Direct | Your Anthropic API Key. |
|
||||
| `ANTHROPIC_AUTH_TOKEN` | Direct | Use for bearer-token-based authentication. |
|
||||
| `ANTHROPIC_CUSTOM_HEADERS` | All | A newline-separated list of `Name: Value` headers. |
|
||||
| `API_TIMEOUT_MS` | All | Custom timeout for API requests (default: 600000ms). |
|
||||
| `CLAUDE_CODE_ADDITIONAL_PROTECTION` | All | Sets `x-anthropic-additional-protection: true`. |
|
||||
| `CLAUDE_CODE_USE_BEDROCK` | Bedrock | Enables the AWS Bedrock provider. |
|
||||
| `CLAUDE_CODE_USE_VERTEX` | Vertex | Enables the GCP Vertex AI provider. |
|
||||
| `CLAUDE_CODE_USE_FOUNDRY` | Foundry | Enables the Azure Foundry provider. |
|
||||
| `CLAUDE_CODE_SKIP_*_AUTH` | 3P | Bypasses local SDK auth for proxy/testing scenarios. |
|
||||
|
||||
---
|
||||
|
||||
## Advanced Configuration & Priority
|
||||
|
||||
When multiple authentication methods are available, Claude Code follows this priority:
|
||||
|
||||
1. **Managed Context**: CCR or Claude Desktop sessions always force OAuth to ensure session isolation. These sessions ignore local API keys and settings to prevent credential leakage.
|
||||
2. **Environment Variables**: `ANTHROPIC_API_KEY` or `ANTHROPIC_AUTH_TOKEN` (unless in "Homespace").
|
||||
3. **Key Helper**: The `apiKeyHelper` script if defined in settings.
|
||||
4. **Local Store**: Credentials saved from a prior `/login` or `~/.claude/settings.json`.
|
||||
|
||||
> [!NOTE]
|
||||
> Using the `--bare` flag forces the CLI into a hermetic mode that only respects `ANTHROPIC_API_KEY` and explicitly passed settings, ignoring the local keychain and OAuth tokens.
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Errors
|
||||
- **401 Unauthorized**: Typically indicates an expired API key or OAuth session.
|
||||
- **403 Forbidden**: Your account may not have access to the requested model or feature.
|
||||
- **AWS/GCP Auth Timeouts**: Often caused by the metadata server check. Ensure your credentials are fresh or set the project/region variables explicitly.
|
||||
|
||||
### Clearing Caches
|
||||
If you encounter persistent auth issues, you can reset your local state:
|
||||
1. Run `/logout` in a session.
|
||||
2. Manually remove `~/.claude/config.json`.
|
||||
3. (macOS only) Clear relevant entries in the Keychain via `Security`.
|
||||
|
||||
---
|
||||
|
||||
> [!TIP]
|
||||
> Use `claude --doctor` to diagnose your current authentication state and connectivity.
|
||||
103
docs/LLAMA_CPP.md
Normal file
103
docs/LLAMA_CPP.md
Normal file
@@ -0,0 +1,103 @@
|
||||
# Llama.cpp Integration Guide - Claude Code
|
||||
|
||||
This guide explores how to implement a custom API provider for Claude Code using `llama.cpp`'s `llama-server`. This setup is ideal for local-first development or when using high-end hardware like **AMD Strix Halo** or **Apple Silicon M2 Max**.
|
||||
|
||||
---
|
||||
|
||||
## 1. Architecture Overview
|
||||
|
||||
`llama-server` provides a REST API that can be configured to mimic the OpenAI or Anthropic message formats. To integrate it into Claude Code, you will need to modify the client initialization.
|
||||
|
||||
### Provider Hook Location
|
||||
The primary location for adding new providers is [`services/api/client.ts`](file:///Users/vlad/Developer/vlad/claude-code/services/api/client.ts).
|
||||
|
||||
1. **Add Provider Type**: Update `APIProvider` in `utils/model/providers.ts` to include `'llama-cpp'`.
|
||||
2. **Environment Variable**: Use a toggle like `CLAUDE_CODE_USE_LLAMA_CPP=true`.
|
||||
3. **Client Configuration**:
|
||||
```typescript
|
||||
if (isEnvTruthy(process.env.CLAUDE_CODE_USE_LLAMA_CPP)) {
|
||||
return new Anthropic({
|
||||
apiKey: 'local-key', // llama-server often ignores this
|
||||
baseURL: process.env.LLAMA_CPP_BASE_URL || 'http://localhost:8080/v1',
|
||||
...ARGS,
|
||||
})
|
||||
}
|
||||
```
|
||||
|
||||
### Remote / Proxy Authentication
|
||||
If you are proxying `llama-server` through an AWS-compatible gateway (e.g., LiteLLM), you can use the `AWS_BEARER_TOKEN_BEDROCK` environment variable to authenticate.
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## 2. Hardware Optimization
|
||||
|
||||
To achieve smooth inference on high-end consumer hardware, utilize the following specialized backends.
|
||||
|
||||
### Apple Silicon (M2 Max)
|
||||
`llama.cpp` has first-class **Metal** support.
|
||||
- **Flags**: Ensure `-ngl` (number of GPU layers) is set to the maximum (e.g., `-ngl 99`) to offload the entire model to the GPU.
|
||||
- **Threads**: Match the number of performance cores (e.g., `-t 8`).
|
||||
|
||||
### AMD Strix Halo
|
||||
Strix Halo features a massive iGPU and a powerful NPU.
|
||||
- **Vulkan Backend**: Use the Vulkan backend for the iGPU (`LLAMA_VULKAN=1`).
|
||||
- **ROCm Backend**: For Linux users, ROCm provides near-native performance for AMD hardware.
|
||||
- **NPU Integration**: If using Windows/Linux with experimental NPU drivers, ensure `llama-server` is compiled with the relevant plugin (e.g., OpenVINO).
|
||||
|
||||
---
|
||||
|
||||
## 3. Overcoming "Slow PP" (Prompt Processing)
|
||||
|
||||
Prompt Processing (PP) is often the bottleneck in agentic workflows where the context grows rapidly.
|
||||
|
||||
### Persistent KV Caching (Slots)
|
||||
`llama-server` supports **slots**, which allow multiple sessions to share or persist their KV cache.
|
||||
- **Persistent Slot**: Use `--slot-save-path /path/to/cache` to save the context state between CLI restarts.
|
||||
- **Continuous Batching**: Use `--cont-batching` to allow the server to process new prompts while tokens are still being generated for other requests.
|
||||
|
||||
### Configuration Tips
|
||||
- **Large Context**: Set a generous context size with `-c 32768` (or higher) to avoid frequent context shifting.
|
||||
- **Flash Attention**: Always enable Flash Attention (`--flash-attn`) to reduce memory bandwidth requirements during PP.
|
||||
|
||||
---
|
||||
|
||||
## 4. Supporting OSS Models
|
||||
|
||||
Claude Code is tuned for Sonnet/Opus, but can be adapted for state-of-the-art open-source models:
|
||||
|
||||
| Model | Mapping Suggestion | Strength |
|
||||
| :--- | :--- | :--- |
|
||||
| **Qwen3-72B-Instruct** | Map to `claude-3-opus-latest` | Excellent reasoning and tool use. |
|
||||
| **GPT-20-OSS** | Map to `claude-3-5-sonnet-latest` | High-speed, high-intelligence balance. |
|
||||
| **GPT-120-OSS** | Map to `claude-3-opus-latest` | Deep complex problem solving. |
|
||||
|
||||
---
|
||||
|
||||
## 5. Recommended `llama-server` Command
|
||||
|
||||
For a dedicated local Claude Code backend:
|
||||
|
||||
```bash
|
||||
./llama-server \
|
||||
-m models/qwen3-72b-q4_k_m.gguf \
|
||||
-c 32768 \
|
||||
-ngl 99 \
|
||||
--flash-attn \
|
||||
--cont-batching \
|
||||
--host 0.0.0.0 \
|
||||
--port 8080 \
|
||||
--api-key local-secret-token \
|
||||
--slot-save-path ./llama_slots
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
> [!CAUTION]
|
||||
> Using local models requires significant VRAM. A 70B model in 4-bit quantization requires ~40GB of VRAM. Ensure your hardware (like Strix Halo with 64GB+ shared RAM) can accommodate the model and KV cache.
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
- **[Authentication Guide](file:///Users/vlad/Developer/vlad/claude-code/docs/AUTH_GUIDE.md)**: Details on general environment variables and credential management.
|
||||
93
docs/Z_AI_GLM.md
Normal file
93
docs/Z_AI_GLM.md
Normal file
@@ -0,0 +1,93 @@
|
||||
# Zhipu AI (Z.AI) GLM Provider Guide - Claude Code
|
||||
|
||||
This guide explains how to integrate **GLM-5.1** from Zhipu AI as a specialized "Coding Plan Provider" in Claude Code. This allows you to use GLM's strong reasoning capabilities for the architectural and planning phase, while maintaining Claude (or another model) for the execution phase.
|
||||
|
||||
---
|
||||
|
||||
## 1. Architecture: The Planner-Executor Split
|
||||
|
||||
Claude Code uses a "Plan Mode" to design complex changes before executing them. This is internally managed by `permissionMode: 'plan'`.
|
||||
|
||||
By specialized the models:
|
||||
- **Planner (GLM-5.1)**: Uses massive context and multi-step reasoning to design a robust implementation plan.
|
||||
- **Executor (Claude 3.5 Sonnet)**: Follows the plan with precision to write and edit code.
|
||||
|
||||
---
|
||||
|
||||
## 2. Implementing the Z.AI Provider
|
||||
|
||||
### Hooking the Client
|
||||
The Z.AI API is largely OpenAI-compatible. You can hook it into Claude Code's existing client initialization in [`services/api/client.ts`](file:///Users/vlad/Developer/vlad/claude-code/services/api/client.ts).
|
||||
|
||||
1. **Add Provider Type**: Update `APIProvider` in `utils/model/providers.ts` to include `'z-ai'`.
|
||||
2. **Client Entry**:
|
||||
```typescript
|
||||
if (isEnvTruthy(process.env.CLAUDE_CODE_USE_Z_AI)) {
|
||||
return new Anthropic({
|
||||
apiKey: process.env.Z_AI_API_KEY,
|
||||
baseURL: process.env.Z_AI_BASE_URL || 'https://open.bigmodel.cn/api/paas/v4/',
|
||||
...ARGS,
|
||||
})
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Highjacking "Plan Mode"
|
||||
|
||||
To ensure GLM-5.1 is only used for planning, you need to modify the model selection logic in [`utils/model/model.ts`](file:///Users/vlad/Developer/vlad/claude-code/utils/model/model.ts).
|
||||
|
||||
Modify `getRuntimeMainLoopModel`:
|
||||
|
||||
```typescript
|
||||
export function getRuntimeMainLoopModel(params: {
|
||||
permissionMode: PermissionMode
|
||||
mainLoopModel: string
|
||||
exceeds200kTokens?: boolean
|
||||
}): ModelName {
|
||||
const { permissionMode, mainLoopModel } = params
|
||||
|
||||
// Specialized Planning Provider: GLM-5.1
|
||||
if (permissionMode === 'plan' && isEnvTruthy(process.env.CLAUDE_CODE_USE_Z_AI)) {
|
||||
return 'glm-5.1' // Or your specific deployment ID
|
||||
}
|
||||
|
||||
// Fallback to Sonnet/Opus for execution
|
||||
return mainLoopModel
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Configuration
|
||||
|
||||
To use this setup, configure the following environment variables:
|
||||
|
||||
| Variable | Description |
|
||||
| :--- | :--- |
|
||||
| `CLAUDE_CODE_USE_Z_AI=true` | Enables the Z.AI provider logic. |
|
||||
| `Z_AI_API_KEY` | Your Zhipu AI API Key. |
|
||||
| `Z_AI_BASE_URL` | The endpoint for BigModel (e.g., `https://open.bigmodel.cn/api/paas/v4/`). |
|
||||
| `Z_AI_BASE_URL` | The endpoint for BigModel (e.g., `https://open.bigmodel.cn/api/paas/v4/`). |
|
||||
| `ANTHROPIC_MODEL` | (Optional) The model to use for execution (e.g., `claude-3-5-sonnet-latest`). |
|
||||
| `CLAUDE_CODE_ADDITIONAL_PROTECTION` | (Optional) Enable strict header validation if required by your gateway. |
|
||||
|
||||
---
|
||||
|
||||
## 5. Optimization & Performance
|
||||
|
||||
### Tool-Calling
|
||||
GLM-5.1 is highly proficient at the OpenAI-style tool-calling schema. Claude Code uses a similar structure, making the migration smooth. However, ensure that your `baseURL` correctly routes to the `/chat/completions` endpoint that supports these features.
|
||||
|
||||
### Long Context
|
||||
GLM-5.1's large context window is a primary advantage for the "Plan Mode" phase, as it can ingest an entire multi-file project structure or complex documentation without truncation.
|
||||
|
||||
---
|
||||
|
||||
> [!TIP]
|
||||
> This "hybrid" approach allows you to leverage GLM's cost-efficient and high-reasoning planning while keeping Claude's world-class code-generation for the final edits.
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
- **[Authentication Guide](file:///Users/vlad/Developer/vlad/claude-code/docs/AUTH_GUIDE.md)**: Details on general environment variables and credential management.
|
||||
Reference in New Issue
Block a user