ChatGPT API

The ChatGPT API has evolved dramatically since its inception, transforming from a simple text generation interface into a comprehensive runtime environment for AI-powered applications. As of February 2026, the ChatGPT API ecosystem encompasses state of the art models including GPT-5.2, GPT-5.3-Codex, and specialized variants for every use case. This completely revised guide provides an in-depth exploration of the latest ChatGPT API capabilities, covering everything from authentication basics to advanced agentic features like server side memory compaction, hosted shell containers, and the new Skills standard. With practical code examples in Python and JavaScript, detailed coverage of the Responses ChatGPT API upgrade, and best practices for building production ready applications, this resource is designed for developers seeking to harness the full power of the ChatGPT API in 2026 and beyond.

1. Introduction: The ChatGPT API as a Runtime Environment

Table of Contents

The landscape of AI development has undergone a fundamental shift. What began as a straightforward ChatGPT API for generating text completions has matured into a sophisticated platform for building autonomous agents, coding assistants, and complex multi-step workflows. The ChatGPT API is no longer just about sending prompts and receiving answers it has become a runtime environment where models can execute code, persist state across sessions, interact with external systems, and even manage their own memory through intelligent compression.

This evolution reflects a broader industry trend: the move from conversational AI to agentic AI. Today’s applications demand models that can not only understand language but also take action, maintain context over extended periods, and collaborate with other systems. The February 2026 upgrades to the Responses ChatGPT API including server-side compaction, hosted shell containers, and Skills support represent a quantum leap in this direction .

This guide covers the complete ChatGPT API ecosystem as it exists in early 2026. Whether you are building a simple chatbot, a code-generation tool, or a fully autonomous agent, the following chapters will equip you with the knowledge to leverage every feature effectively.

2. The Model Landscape: What’s Available in 2026

2.1 Current Model Lineup

The model landscape has simplified considerably. With the retirement of GPT-4o, GPT-4.1, and o4-mini from ChatGPT on February 13, 2026 , the focus has shifted to the GPT-5 family and specialized variants. These models continue to be available in the ChatGPT API even after their retirement from the consumer chat interface .

Core Models:

GPT-5.2 – The flagship general-purpose model, available in two variants:
- GPT-5.2 Instant: Optimized for speed and general conversation, with recent updates improving response style and grounding
- GPT-5.2 Thinking: Designed for complex reasoning tasks, with configurable thinking time (Light, Standard, Extended)
GPT-5.3-Codex – Released February 5, 2026, this represents OpenAI’s most capable agentic coding model. It combines Codex and GPT-5 training stacks, delivering 25% faster performance and state-of-the-art benchmark results
GPT-5.1-Codex-Max – A frontier agentic coding model for long-running, project-scale work. It uses compaction technology to maintain coherence across multiple context windows
GPT-5-Codex-Mini – A smaller, cost-effective variant providing up to 4x more usage within subscription limits
Open-weight Models – gpt-oss-120b and gpt-oss-20b are text-only reasoning models available for teams wanting to run and customize models on their own infrastructure

2.2 Model Selection Strategy

Choosing the right model depends on your use case:

General conversational applications: GPT-5.2 Instant offers the best balance of speed, quality, and cost
Complex reasoning tasks: GPT-5.2 Thinking with Extended thinking time
Coding assistants: GPT-5.3-Codex for agentic workflows, GPT-5-Codex-Mini for high-volume, cost-sensitive applications
Long-running projects: GPT-5.1-Codex-Max with its compaction capabilities
On-premise deployment: The open-weight OSS models

2.3 Model Retirement and Migration

The February 2026 retirement of GPT-4o and related models from ChatGPT serves as an important reminder: models have lifecycles. While ChatGPT API access remains unchanged for now, the industry trend is toward continuous model evolution . When migrating between model versions, testing should focus not just on functional correctness but on behavioral consistency output style, refusal patterns, and handling of edge cases .

3. ChatGPT API Fundamentals

3.1 Authentication and Setup

Authentication remains straightforward, though security best practices have evolved. Always store ChatGPT API keys in environment variables or secure secret management systems.

Python Setup:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
    # Optional: Set organization ID for multi-org accounts
    organization=os.environ.get("OPENAI_ORG_ID")
)

JavaScript/Node.js Setup:

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  organization: process.env.OPENAI_ORG_ID // Optional
});

3.2 Chat Completions ChatGPT API

The Chat Completions endpoint remains the workhorse for most applications. Recent updates have improved response quality, with GPT-5.2 Instant now delivering more measured, grounded responses and placing important information upfront .

response = client.chat.completions.create(
    model="gpt-5.2-instant",  # or "gpt-5.2-thinking", "gpt-5.3-codex"
    messages=[
        {"role": "system", "content": "You are a helpful assistant with a friendly tone."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    temperature=0.5,
    max_tokens=500
)

print(response.choices[0].message.content)

3.3 Thinking Time Configuration

For GPT-5.2 Thinking, you can now configure the reasoning depth. OpenAI periodically adjusts default thinking times based on user preference data .

response = client.chat.completions.create(
    model="gpt-5.2-thinking",
    messages=[{"role": "user", "content": "Solve this complex math problem: ..."}],
    reasoning_effort="extended"  # Options: light, standard, extended
)

The thinking level toggle, introduced in September 2025, gives users fine-grained control over the speed-accuracy tradeoff .

3.4 Structured Outputs and JSON Mode

For applications requiring guaranteed structured data, JSON mode remains essential. The model will output valid JSON that matches your specified schema.

response = client.chat.completions.create(
    model="gpt-5.2-instant",
    messages=[
        {"role": "user", "content": "List three programming languages and their primary use cases."}
    ],
    response_format={"type": "json_object"}
)

import json
data = json.loads(response.choices[0].message.content)

For more robust validation, combine with Pydantic:

from pydantic import BaseModel
from typing import List

class Language(BaseModel):
    name: str
    use_case: str

class LanguageList(BaseModel):
    languages: List[Language]

completion = client.beta.chat.completions.parse(
    model="gpt-5.2-instant",
    messages=[{"role": "user", "content": "List three programming languages and their uses."}],
    response_format=LanguageList
)

languages = completion.choices[0].message.parsed

4. The Responses API: Building True Agents

The Responses ChatGPT API represents the most significant evolution in OpenAI’s ChatGPT API offerings. The February 2026 upgrade introduced transformative capabilities that address the core challenges of building autonomous agents: memory management and execution environment .

4.1 Server-Side Compaction: Solving the Memory Problem

One of the fundamental limitations of conversational AI has been the context window. As conversations grow longer, they inevitably hit token limits, forcing developers to truncate or summarize history often losing critical context in the process.

Server-side compaction solves this by having the model intelligently compress its own behavior history. When the conversation reaches a threshold you specify, the server automatically compresses and prunes the context within the same streaming response, preserving key information while reducing token count .

response = client.responses.create(
    model="gpt-5.3-codex",
    input=[
        {"role": "user", "content": "Let's work on a multi-step data analysis project..."}
    ],
    compaction={
        "enabled": True,
        "threshold": 80000,  # Start compacting at 80k tokens
        "strategy": "intelligent"  # Model decides what to preserve
    }
)

The compressed content is encrypted and designed for model continuation, not human readability. This is a critical distinction—it’s about maintaining the agent’s working memory, not creating summaries for users .

4.2 Hosted Shell Containers: Giving Models an Execution Environment

Perhaps the most revolutionary addition is the hosted shell container capability. Through the Shell tool, models can now execute commands in a full Debian 12 environment, complete with pre-installed runtimes for Python, Node.js, Java, Go, and Ruby .

response = client.responses.create(
    model="gpt-5.3-codex",
    input=[
        {"role": "user", "content": "Analyze this dataset and generate a visualization."}
    ],
    tools=[{
        "type": "shell",
        "container": {
            "type": "managed",
            "image": "debian:12",
            "persistent": True,  # Keep container for follow-up requests
            "network_access": "isolated"  # Default: no external network
        }
    }]
)

The container can persist across multiple requests, allowing the model to maintain state—files, installed packages, environment variables—throughout a project. When you reference the same container ID in subsequent requests, the environment remains intact .

Network Access Control: For security, hosted containers default to no external network access. If your agent needs to reach external APIs or download packages, you must explicitly configure allowed domains through your organization’s admin settings .

# In admin dashboard: Configure allowed domains
allowed_domains = ["pypi.org", "files.pythonhosted.org", "api.github.com"]

# In request: Specify per-call access
response = client.responses.create(
    model="gpt-5.3-codex",
    input=[...],
    tools=[{
        "type": "shell",
        "container": {"type": "managed", "network_access": "limited"},
        "allowed_domains": ["pypi.org"]  # Override for this request
    }]
)

Domain Secrets: For API keys and credentials, use the domain secrets mechanism. The model sees only placeholder names; the system injects real credentials only when making approved requests to trusted domains .

# Configure secrets in admin dashboard
secrets = {
    "github_token": {"type": "bearer", "value": "ghp_..."}
}

# In request, reference by placeholder
response = client.responses.create(
    model="gpt-5.3-codex",
    input=[{"role": "user", "content": "Push the results to GitHub."}],
    tools=[{
        "type": "shell",
        "container": {"type": "managed"},
        "secrets": ["github_token"]
    }]
)

4.3 Skills: Reusable, Versioned Workflows

The new Skills standard, based on the open SKILL.md specification, allows you to package reusable workflows that can be shared across models and platforms . A Skill is a versioned collection of files with a manifest that defines its capabilities and requirements.

# SKILL.md
name: data-analysis-pipeline
version: 1.2.0
description: Standardized data analysis workflow
requirements:
  - python=3.11
  - pandas>=2.0
  - matplotlib
entrypoint: analyze.py
inputs:
  - name: dataset
    type: file
    description: CSV file to analyze
outputs:
  - name: report
    type: file
    format: markdown
  - name: visualization
    type: file
    format: png

Skills can be mounted into hosted containers, enabling consistent execution across different environments .

response = client.responses.create(
    model="gpt-5.3-codex",
    input=[{"role": "user", "content": "Run the standard analysis on this sales data."}],
    tools=[{
        "type": "shell",
        "container": {"type": "managed"},
        "skills": ["data-analysis-pipeline@1.2.0"]
    }]
)

5. Deep Research and Advanced Analysis

5.1 Deep Research Upgrades

The deep research feature, originally launched in 2025, received significant upgrades in February 2026 . Now powered by GPT-5.2, it offers:

Source specification: Focus research on specific websites or connected apps
Real-time progress tracking: Monitor research as it happens in a dedicated viewer
Mid-run adjustments: Add new sources or refine the research plan while the agent works
Enhanced reporting: Download completed reports in Markdown, Word, or PDF formats

# Initiate deep research via API
research = client.research.create(
    model="gpt-5.2",
    query="Latest advancements in quantum machine learning",
    sources={
        "include": ["arxiv.org", "nature.com"],
        "exclude": ["predatory-journals.org"]
    },
    max_depth=3,
    output_format="markdown"
)

# Track progress
status = client.research.status(research.id)
print(f"Progress: {status.progress}%")
print(f"Sources found: {status.sources_found}")

# When complete, retrieve report
report = client.research.retrieve(research.id)
with open("quantum_ml_report.md", "w") as f:
    f.write(report.content)

5.2 Code Interpreter Integration

The Code Interpreter tool remains available for executing Python code in a sandboxed environment. With the new container infrastructure, code execution is more powerful and flexible.

response = client.responses.create(
    model="gpt-5.3-codex",
    input=[{"role": "user", "content": "Analyze this CSV and create a visualization."}],
    tools=[{"type": "code_interpreter"}],
    files=[{
        "filename": "sales_data.csv",
        "content": base64.b64encode(csv_content).decode()
    }]
)

# The model will write and execute code, returning results including generated images

6. Streaming and Real-Time Applications

6.1 Streaming Fundamentals

Streaming remains essential for responsive user interfaces. The implementation is straightforward:

stream = client.chat.completions.create(
    model="gpt-5.2-instant",
    messages=[{"role": "user", "content": "Write a short story about a robot."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

6.2 Streaming with Tool Calls

When using tools in streaming mode, tool calls are delivered as special events:

stream = client.responses.create(
    model="gpt-5.3-codex",
    input=[{"role": "user", "content": "Check disk usage and find large files."}],
    tools=[{"type": "shell", "container": {"type": "managed"}}],
    stream=True
)

for event in stream:
    if event.type == "tool_call":
        print(f"Model wants to execute: {event.tool_call.command}")
        # Execute command, send result back
    elif event.type == "content":
        print(event.delta, end="")

6.3 Handling Long-Running Operations

With compaction and persistent containers, operations can span multiple requests and even multiple sessions. Implement robust state management:

# Start a long-running task
session = client.responses.create(
    model="gpt-5.1-codex-max",
    input=[{"role": "user", "content": "Analyze this codebase and identify optimization opportunities."}],
    compaction={"enabled": True, "threshold": 50000},
    container={"persistent": True}
)

session_id = session.container_id

# Later (minutes or hours later), continue the work
continuation = client.responses.create(
    model="gpt-5.1-codex-max",
    input=[{"role": "user", "content": "Continue the analysis. Focus on database queries."}],
    container_id=session_id  # Resume the same container
)

7. Personalization and Tone Control

7.1 Personality System Prompt Updates

January 2026 brought significant updates to GPT-5.2 Instant’s default personality, making it more conversational and contextually adaptive . Users can now select from base styles and fine-tune characteristics.

Programmatic tone control:

response = client.chat.completions.create(
    model="gpt-5.2-instant",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Recommend a good book."}
    ],
    tone_preset="friendly",  # Options: friendly, professional, candid, quirky, etc.
    style_parameters={
        "warmth": 0.8,        # 0-1 scale
        "enthusiasm": 0.6,    # 0-1 scale
        "emoji_usage": 0.3,   # 0-1 scale
        "headers_and_lists": 0.5  # Preference for structured output
    }
)

7.2 Characteristic Controls

The December 2025 update introduced granular controls for specific characteristics :

Warmth: How caring and empathetic the tone feels
Enthusiasm: Level of energy and excitement
Headers & Lists: Preference for structured formatting
Emoji Usage: Frequency of emoji in responses

These can be adjusted independently to create a custom personality that matches your brand voice.

7.3 Memory and Personalization

When reference chat history is enabled, ChatGPT API can now more reliably find specific details from past conversations. Retrieved information appears with source attribution .

# In API, you can provide conversation history explicitly
response = client.chat.completions.create(
    model="gpt-5.2-instant",
    messages=conversation_history + [{"role": "user", "content": "What did we discuss last time?"}]
)

8. Safety, Alignment, and Responsible Use

8.1 The Model Spec Evolution

OpenAI’s Model Spec, a living document outlining intended model behavior, has undergone significant updates to address emerging challenges .

Under-18 (U18) Principles (December 2025): Age-appropriate guidance for teen users, including:

Clearer boundaries and reduced exposure to potentially harmful content
Refusal to participate in self-harm, sexualized roleplay, dangerous activities
Encouragement of real-world support from trusted adults

Mental Health and Well-Being Guidance (October 2025): Extended to cover signs of delusions and mania. The model now responds safely and empathetically to distress without reinforcing harmful ideas .

Respect Real-World Ties: New principles discourage language that could contribute to isolation or emotional reliance on the assistant, even when users perceive the AI as a companion .

8.2 Age Prediction and Safeguards

OpenAI has rolled out an age prediction model that helps determine whether an account likely belongs to someone under 18, enabling appropriate safeguards . The model considers behavioral and account-level signals.

If incorrectly classified, users can verify their age through the Persona service .

8.3 Health in ChatGPT

The January 2026 introduction of “Health” creates a dedicated space for health and wellness conversations . Key features:

Secure connection to medical records, Apple Health, and supported wellness apps
Answers grounded in user’s own data
Conversations and data kept separate from general chats
Not used for foundation model training

Health conversations have enhanced privacy protections and are designed to help users navigate medical care, not replace it .

8.4 Content Moderation and Refusal Style

The September 2025 Model Spec update changed the terminology from “refusal” to “safe completion,” reflecting a more helpful and transparent approach to safety boundaries . Rather than simply refusing, the model now provides context about why certain requests cannot be fulfilled and suggests alternatives when appropriate.

8.5 Agentic Safety Principles

With models now capable of taking actions in the world, new safety principles apply :

Act within agreed scope: Like a consultant with a Scope of Work, the agent acts only with explicit or implicit user agreement
Control side effects: Minimize irreversible actions, prefer reversible approaches
Communicate impacts: Disclose actions and their consequences

9. Testing and Quality Assurance

9.1 Model Migration Testing

The retirement of GPT-4o serves as a case study in model lifecycle management. When migrating between model versions, comprehensive testing should cover :

Pre-Migration:

Curate golden prompts covering all business scenarios
Establish baseline outputs and acceptable variation ranges
Run offline A/B comparisons between old and new models
Identify unacceptable changes (e.g., increased refusal rates, hallucination spikes)

During Migration:

Gradual rollout starting with low-risk requests
Monitor key metrics: failure rate, refusal rate, user follow-up rate
Collect and analyze edge cases

Post-Migration:

Regular regression testing
Audit trail for prompt and configuration changes
User communication about behavior changes

9.2 Behavioral Testing

Testing should focus on behavior, not just exact outputs :

def test_model_behavior(test_cases):
    for case in test_cases:
        response = client.chat.completions.create(
            model="gpt-5.2-instant",
            messages=[{"role": "user", "content": case.prompt}]
        ).choices[0].message.content
        
        # Check semantic similarity, not exact match
        assert semantic_similarity(response, case.expected) > 0.8
        
        # Verify key information is present
        for required_info in case.required_facts:
            assert required_info in response
            
        # Ensure no forbidden patterns
        for forbidden in case.forbidden_patterns:
            assert forbidden not in response

10. Pricing and Cost Optimization

10.1 Current Pricing Structure

Pricing remains token-based, with variations by model:

Model	Input ($/1M tokens)	Output ($/1M tokens)	Notes
GPT-5.2 Instant	2.50	10.00	Most cost-effective for general use
GPT-5.2 Thinking	15.00	60.00	Higher cost for deep reasoning
GPT-5.3-Codex	12.00	48.00	Optimized for coding
GPT-5.1-Codex-Max	20.00	80.00	Project-scale work
GPT-5-Codex-Mini	3.00	12.00	4x more usage within limits

10.2 Cost Optimization Strategies

Use compaction: Server-side compaction reduces token usage for long-running conversations
Leverage caching: Persistent containers maintain state without repeated context
Choose appropriate thinking effort: Use “light” or “standard” for most queries, “extended” only when necessary
Monitor usage: Set up alerts for unexpected spending patterns

11. Real-World Applications

11.1 Autonomous Coding Agents

With GPT-5.3-Codex and the hosted container infrastructure, you can build coding agents that:

Clone repositories and analyze codebases
Run tests and debug failures
Generate and apply patches
Push changes to version control

agent = CodingAgent(
    model="gpt-5.3-codex",
    container={
        "type": "managed",
        "persistent": True,
        "skills": ["code-review@2.1", "testing-framework@1.5"]
    }
)

result = agent.run("""
    Clone the repository, fix the failing tests in the auth module,
    and create a pull request with the changes.
""")

11.2 Data Analysis Pipelines

Combine code interpreter, shell tools, and compaction for complex data workflows:

analysis = client.responses.create(
    model="gpt-5.3-codex",
    input=[{"role": "user", "content": """
        Import the sales data, clean it, perform regression analysis,
        generate visualizations, and create a summary report.
    """}],
    tools=[
        {"type": "code_interpreter"},
        {"type": "shell", "container": {"type": "managed"}}
    ],
    compaction={"enabled": True, "threshold": 50000}
)

11.3 Research Assistants

Deep research capabilities enable sophisticated information gathering:

research = client.research.create(
    model="gpt-5.2",
    query="Compare renewable energy adoption rates across European countries",
    sources={"include": ["europa.eu", "irena.org", "iea.org"]},
    output_format="markdown",
    depth="comprehensive"  # Hours of research compressed into minutes
)

11.4 Personalized Education

With tone controls and memory, build adaptive tutoring systems:

tutor = client.chat.completions.create(
    model="gpt-5.2-instant",
    messages=[
        {"role": "system", "content": "You are a patient math tutor for a 10th-grade student."},
        {"role": "user", "content": "I don't understand quadratic equations."}
    ],
    tone_preset="friendly",
    style_parameters={"warmth": 0.9, "enthusiasm": 0.7}
)

12. Future Directions ChatGPT API

The pace of innovation shows no signs of slowing. Based on recent trends, we can anticipate:

Tighter integration between model capabilities and execution environments
More sophisticated memory management with hierarchical compaction strategies
Expanded Skills ecosystem with interoperable agent capabilities
Enhanced multimodal understanding beyond text and images
Continued refinement of safety and alignment mechanisms

For developers, the key to success lies not just in mastering current APIs but in building flexible architectures that can adapt to continuous evolution. The models will keep changing, but the principles of thoughtful design, comprehensive testing, and responsible deployment will remain constant.