# HuggingFace Prompt Renderer MCP Server Model Context Protocol (MCP) server for rendering conversation messages into model-specific prompt strings using HuggingFace tokenizer chat templates. ## Requirements - [uv](https://docs.astral.sh/uv/) - Fast Python package installer ## Usage ### MCP Server Mode Run the MCP server over stdio for use with MCP clients: ```bash uv run cmd/prompt-rendering/server.py --mcp ``` Add to your MCP client configuration (e.g., for Claude Desktop): ```json { "mcpServers": { "huggingface-prompt-renderer": { "command": "uv", "args": [ "run", "--directory", "", "cmd/prompt-rendering/server.py", "--mcp" ] } } } ``` ### FastAPI Server Mode Start a FastAPI server for manual HTTP testing: ```bash # Start on default port 8000 uv run cmd/prompt-rendering/server.py --host 0.0.0.0 --port 8000 # Start on custom port uv run cmd/prompt-rendering/server.py --host 0.0.0.0 --port 9000 ``` #### Endpoints | Method | Path | Description | |--------|------|-------------| | POST | `/generate-prompt` | Generate prompt from messages | | GET | `/health` | Health check | ### Test with curl ```bash # Basic user message curl -X POST http://localhost:8000/generate-prompt \ -H "Content-Type: application/json" \ -d '{ "messages": [{"role": "user", "content": "Hello!"}] }' # With tools curl -X POST http://localhost:8000/generate-prompt \ -H "Content-Type: application/json" \ -d '{ "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the weather?"} ], "model": "Qwen/Qwen3-Coder-480B-A35B-Instruct", "tools": [{ "type": "function", "function": { "name": "get_weather", "description": "Get the current weather", "parameters": { "type": "object", "required": ["location"], "properties": { "location": {"type": "string", "description": "The city"} } } } }] }' # With tool calls curl -X POST http://localhost:8000/generate-prompt \ -H "Content-Type: application/json" \ -d '{ "messages": [ {"role": "user", "content": "What is the weather in SF?"}, { "role": "assistant", "tool_calls": [{ "id": "call_1", "type": "function", "function": { "name": "get_weather", "arguments": {"location": "San Francisco"} } }] }, {"role": "tool", "content": "{\"temperature\": 68}", "tool_call_id": "call_1"} ], "tools": [{ "type": "function", "function": { "name": "get_weather", "parameters": { "type": "object", "properties": {"location": {"type": "string"}} } } }] }' ``` ## Supported Message Formats The server supports multiple message formats: | Format | Description | |--------|-------------| | OpenAI | Standard `role`, `content`, `tool_calls`, `tool_call_id` | | OLMo | Adds `functions` and `function_calls` fields | | DeepSeek | Tool call arguments must be JSON strings | ## Tool Support | Setting | Description | |---------|-------------| | `inject_tools_as_functions=true` | Injects tools into system message as `functions` key (OLMo-style) | | `inject_tools_as_functions=false` | Passes tools separately to `apply_chat_template` (standard transformers) | ## Models The server uses HuggingFace's `transformers` library and supports any model with a chat template. Default: `Qwen/Qwen3-Coder-480B-A35B-Instruct` ## Dependencies The script uses PEP 723 inline dependency metadata. When run with `uv`, dependencies are automatically installed into an isolated environment: - `fastapi` - Web framework - `uvicorn` - ASGI server - `transformers` - HuggingFace tokenizer - `jinja2` - Template engine - `mcp` - Model Context Protocol