LLM Access
The Research and Computing Services created a Large Language Model server to provide access to the Carleton University community. REST API endpoints can be accessed upon request. There are many models deployed in this server, including:
- all-minilm:33m
- all-minilm:latest
- command-r-plus:latest
- command-r:latest
- embeddinggemma:latest
- gemma3:27b
- gemma4:26b
- gemma4:31b
- gemma4:latest
- gpt-oss:120b
- gpt-oss:20b
- llama3.1:latest
- llama3.2-vision:90b
- llama3.2:latest
- llama3.3:latest
- llama4:maverick
- llama4:scout
- mistral-large:latest
- mixtral:8x22b
- mxbai-embed-large:latest
- qwen2.5:latest
- qwen3-coder-next:q8_0
- qwen3-embedding:latest
- qwen3.5:122b
- qwen3.5:27b
- qwen3.5:35b
- qwen3.5:9b
- qwen3:235b
- qwen3:32b
- qwen3:8b
- translategemma:latest
The backend uses Ollama to deploy models. You can use the instructions from Ollama docs, but provide the variable “x-api-key” with your RCS API Key, which can be requested here. If approved, you will receive information about the host and API Key.
The code below is an example of access using the Ollama package in Python:
from ollama import Client
from argparse import ArgumentParser
DEFAULT_HOST = “https://rcsllm.carleton.ca/rcsapi”
parser = ArgumentParser(description=”Ollama API Client Example”)
parser.add_argument(“–host”, type=str, help=”Ollama server host”, required=False, default=DEFAULT_HOST)
parser.add_argument(“–model”, type=str, help=”Model to use for requests”, required=False, default=”gpt-oss:120b”)
parser.add_argument(“–prompt”, type=str, help=”Prompt to send to the model”, required=False, default=”What is an LLM and how are you compared to other models?”)
parser.add_argument(“–stream”, action=”store_true”, help=”Stream the response”, required=False, default=True) parser.add_argument(“–api_key”, type=str, help=”API Key for authentication”, required=True)
parser.add_argument(“–list_models”, action=”store_true”, help=”List available models”, required=False, default=False)
if __name__ == “__main__”:
args = parser.parse_args()
api_key = args.api_key
custom_header = {“x-api-key”: api_key} if api_key else {}
ollama_client = Client(host=args.host, headers=custom_header)
if args.list_models:
models = ollama_client.list()
print(“Available models:”)
for model in models.models:
print(f”- {model.model}”)
exit(0)
stream = args.stream
response = ollama_client.chat(
model=args.model,
messages=[
{“role”: “user”, “content”: args.prompt} ],
stream=stream ) if stream: for chunk in response: print(chunk[‘message’][‘content’], end=””, flush=True) else: print(response[‘message’][‘content’])
If you wish to get access to this resource, fill out the form available here.