The ChatGPT API has evolved dramatically since its inception, transforming from a simple text generation interface into a comprehensive runtime environment for AI-powered applications. As of February 2026, the ChatGPT API ecosystem encompasses state of the art models including GPT-5.2, GPT-5.3-Codex, and specialized variants for every use case. This completely revised guide provides an in-depth exploration of the latest ChatGPT API capabilities, covering everything from authentication basics to advanced agentic features like server side memory compaction, hosted shell containers, and the new Skills standard. With practical code examples in Python and JavaScript, detailed coverage of the Responses ChatGPT API upgrade, and best practices for building production ready applications, this resource is designed for developers seeking to harness the full power of the ChatGPT API in 2026 and beyond.
1. Introduction: The ChatGPT API as a Runtime Environment
The landscape of AI development has undergone a fundamental shift. What began as a straightforward ChatGPT API for generating text completions has matured into a sophisticated platform for building autonomous agents, coding assistants, and complex multi-step workflows. The ChatGPT API is no longer just about sending prompts and receiving answers it has become a runtime environment where models can execute code, persist state across sessions, interact with external systems, and even manage their own memory through intelligent compression.
This evolution reflects a broader industry trend: the move from conversational AI to agentic AI. Today’s applications demand models that can not only understand language but also take action, maintain context over extended periods, and collaborate with other systems. The February 2026 upgrades to the Responses ChatGPT API including server-side compaction, hosted shell containers, and Skills support represent a quantum leap in this direction .
This guide covers the complete ChatGPT API ecosystem as it exists in early 2026. Whether you are building a simple chatbot, a code-generation tool, or a fully autonomous agent, the following chapters will equip you with the knowledge to leverage every feature effectively.
2. The Model Landscape: What’s Available in 2026
2.1 Current Model Lineup
The model landscape has simplified considerably. With the retirement of GPT-4o, GPT-4.1, and o4-mini from ChatGPT on February 13, 2026 , the focus has shifted to the GPT-5 family and specialized variants. These models continue to be available in the ChatGPT API even after their retirement from the consumer chat interface .
Core Models:
-
GPT-5.2 – The flagship general-purpose model, available in two variants:
-
GPT-5.2 Instant: Optimized for speed and general conversation, with recent updates improving response style and grounding
-
GPT-5.2 Thinking: Designed for complex reasoning tasks, with configurable thinking time (Light, Standard, Extended)
-
-
GPT-5.3-Codex – Released February 5, 2026, this represents OpenAI’s most capable agentic coding model. It combines Codex and GPT-5 training stacks, delivering 25% faster performance and state-of-the-art benchmark results
-
GPT-5.1-Codex-Max – A frontier agentic coding model for long-running, project-scale work. It uses compaction technology to maintain coherence across multiple context windows
-
GPT-5-Codex-Mini – A smaller, cost-effective variant providing up to 4x more usage within subscription limits
-
Open-weight Models –
gpt-oss-120bandgpt-oss-20bare text-only reasoning models available for teams wanting to run and customize models on their own infrastructure
2.2 Model Selection Strategy
Choosing the right model depends on your use case:
-
General conversational applications: GPT-5.2 Instant offers the best balance of speed, quality, and cost
-
Complex reasoning tasks: GPT-5.2 Thinking with Extended thinking time
-
Coding assistants: GPT-5.3-Codex for agentic workflows, GPT-5-Codex-Mini for high-volume, cost-sensitive applications
-
Long-running projects: GPT-5.1-Codex-Max with its compaction capabilities
-
On-premise deployment: The open-weight OSS models
2.3 Model Retirement and Migration
The February 2026 retirement of GPT-4o and related models from ChatGPT serves as an important reminder: models have lifecycles. While ChatGPT API access remains unchanged for now, the industry trend is toward continuous model evolution . When migrating between model versions, testing should focus not just on functional correctness but on behavioral consistency output style, refusal patterns, and handling of edge cases .
3. ChatGPT API Fundamentals
3.1 Authentication and Setup
Authentication remains straightforward, though security best practices have evolved. Always store ChatGPT API keys in environment variables or secure secret management systems.
Python Setup:
import os from openai import OpenAI client = OpenAI( api_key=os.environ.get("OPENAI_API_KEY"), # Optional: Set organization ID for multi-org accounts organization=os.environ.get("OPENAI_ORG_ID") )
JavaScript/Node.js Setup:
import OpenAI from 'openai'; const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY, organization: process.env.OPENAI_ORG_ID // Optional });
3.2 Chat Completions ChatGPT API
The Chat Completions endpoint remains the workhorse for most applications. Recent updates have improved response quality, with GPT-5.2 Instant now delivering more measured, grounded responses and placing important information upfront .
response = client.chat.completions.create( model="gpt-5.2-instant", # or "gpt-5.2-thinking", "gpt-5.3-codex" messages=[ {"role": "system", "content": "You are a helpful assistant with a friendly tone."}, {"role": "user", "content": "Explain quantum computing in simple terms."} ], temperature=0.5, max_tokens=500 ) print(response.choices[0].message.content)
3.3 Thinking Time Configuration
For GPT-5.2 Thinking, you can now configure the reasoning depth. OpenAI periodically adjusts default thinking times based on user preference data .
response = client.chat.completions.create( model="gpt-5.2-thinking", messages=[{"role": "user", "content": "Solve this complex math problem: ..."}], reasoning_effort="extended" # Options: light, standard, extended )
The thinking level toggle, introduced in September 2025, gives users fine-grained control over the speed-accuracy tradeoff .
3.4 Structured Outputs and JSON Mode
For applications requiring guaranteed structured data, JSON mode remains essential. The model will output valid JSON that matches your specified schema.
response = client.chat.completions.create( model="gpt-5.2-instant", messages=[ {"role": "user", "content": "List three programming languages and their primary use cases."} ], response_format={"type": "json_object"} ) import json data = json.loads(response.choices[0].message.content)
For more robust validation, combine with Pydantic:
from pydantic import BaseModel from typing import List class Language(BaseModel): name: str use_case: str class LanguageList(BaseModel): languages: List[Language] completion = client.beta.chat.completions.parse( model="gpt-5.2-instant", messages=[{"role": "user", "content": "List three programming languages and their uses."}], response_format=LanguageList ) languages = completion.choices[0].message.parsed
4. The Responses API: Building True Agents
The Responses ChatGPT API represents the most significant evolution in OpenAI’s ChatGPT API offerings. The February 2026 upgrade introduced transformative capabilities that address the core challenges of building autonomous agents: memory management and execution environment .
4.1 Server-Side Compaction: Solving the Memory Problem
One of the fundamental limitations of conversational AI has been the context window. As conversations grow longer, they inevitably hit token limits, forcing developers to truncate or summarize history often losing critical context in the process.
Server-side compaction solves this by having the model intelligently compress its own behavior history. When the conversation reaches a threshold you specify, the server automatically compresses and prunes the context within the same streaming response, preserving key information while reducing token count .
response = client.responses.create( model="gpt-5.3-codex", input=[ {"role": "user", "content": "Let's work on a multi-step data analysis project..."} ], compaction={ "enabled": True, "threshold": 80000, # Start compacting at 80k tokens "strategy": "intelligent" # Model decides what to preserve } )
The compressed content is encrypted and designed for model continuation, not human readability. This is a critical distinction—it’s about maintaining the agent’s working memory, not creating summaries for users .
4.2 Hosted Shell Containers: Giving Models an Execution Environment
Perhaps the most revolutionary addition is the hosted shell container capability. Through the Shell tool, models can now execute commands in a full Debian 12 environment, complete with pre-installed runtimes for Python, Node.js, Java, Go, and Ruby .
response = client.responses.create( model="gpt-5.3-codex", input=[ {"role": "user", "content": "Analyze this dataset and generate a visualization."} ], tools=[{ "type": "shell", "container": { "type": "managed", "image": "debian:12", "persistent": True, # Keep container for follow-up requests "network_access": "isolated" # Default: no external network } }] )
The container can persist across multiple requests, allowing the model to maintain state—files, installed packages, environment variables—throughout a project. When you reference the same container ID in subsequent requests, the environment remains intact .
Network Access Control: For security, hosted containers default to no external network access. If your agent needs to reach external APIs or download packages, you must explicitly configure allowed domains through your organization’s admin settings .
# In admin dashboard: Configure allowed domains allowed_domains = ["pypi.org", "files.pythonhosted.org", "api.github.com"] # In request: Specify per-call access response = client.responses.create( model="gpt-5.3-codex", input=[...], tools=[{ "type": "shell", "container": {"type": "managed", "network_access": "limited"}, "allowed_domains": ["pypi.org"] # Override for this request }] )
Domain Secrets: For API keys and credentials, use the domain secrets mechanism. The model sees only placeholder names; the system injects real credentials only when making approved requests to trusted domains .
# Configure secrets in admin dashboard secrets = { "github_token": {"type": "bearer", "value": "ghp_..."} } # In request, reference by placeholder response = client.responses.create( model="gpt-5.3-codex", input=[{"role": "user", "content": "Push the results to GitHub."}], tools=[{ "type": "shell", "container": {"type": "managed"}, "secrets": ["github_token"] }] )
4.3 Skills: Reusable, Versioned Workflows
The new Skills standard, based on the open SKILL.md specification, allows you to package reusable workflows that can be shared across models and platforms . A Skill is a versioned collection of files with a manifest that defines its capabilities and requirements.
# SKILL.md name: data-analysis-pipeline version: 1.2.0 description: Standardized data analysis workflow requirements: - python=3.11 - pandas>=2.0 - matplotlib entrypoint: analyze.py inputs: - name: dataset type: file description: CSV file to analyze outputs: - name: report type: file format: markdown - name: visualization type: file format: png
Skills can be mounted into hosted containers, enabling consistent execution across different environments .
response = client.responses.create( model="gpt-5.3-codex", input=[{"role": "user", "content": "Run the standard analysis on this sales data."}], tools=[{ "type": "shell", "container": {"type": "managed"}, "skills": ["data-analysis-pipeline@1.2.0"] }] )
5. Deep Research and Advanced Analysis
5.1 Deep Research Upgrades
The deep research feature, originally launched in 2025, received significant upgrades in February 2026 . Now powered by GPT-5.2, it offers:
-
Source specification: Focus research on specific websites or connected apps
-
Real-time progress tracking: Monitor research as it happens in a dedicated viewer
-
Mid-run adjustments: Add new sources or refine the research plan while the agent works
-
Enhanced reporting: Download completed reports in Markdown, Word, or PDF formats
# Initiate deep research via API research = client.research.create( model="gpt-5.2", query="Latest advancements in quantum machine learning", sources={ "include": ["arxiv.org", "nature.com"], "exclude": ["predatory-journals.org"] }, max_depth=3, output_format="markdown" ) # Track progress status = client.research.status(research.id) print(f"Progress: {status.progress}%") print(f"Sources found: {status.sources_found}") # When complete, retrieve report report = client.research.retrieve(research.id) with open("quantum_ml_report.md", "w") as f: f.write(report.content)
5.2 Code Interpreter Integration
The Code Interpreter tool remains available for executing Python code in a sandboxed environment. With the new container infrastructure, code execution is more powerful and flexible.
response = client.responses.create( model="gpt-5.3-codex", input=[{"role": "user", "content": "Analyze this CSV and create a visualization."}], tools=[{"type": "code_interpreter"}], files=[{ "filename": "sales_data.csv", "content": base64.b64encode(csv_content).decode() }] ) # The model will write and execute code, returning results including generated images
6. Streaming and Real-Time Applications
6.1 Streaming Fundamentals
Streaming remains essential for responsive user interfaces. The implementation is straightforward:
stream = client.chat.completions.create( model="gpt-5.2-instant", messages=[{"role": "user", "content": "Write a short story about a robot."}], stream=True ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="")
6.2 Streaming with Tool Calls
When using tools in streaming mode, tool calls are delivered as special events:
stream = client.responses.create( model="gpt-5.3-codex", input=[{"role": "user", "content": "Check disk usage and find large files."}], tools=[{"type": "shell", "container": {"type": "managed"}}], stream=True ) for event in stream: if event.type == "tool_call": print(f"Model wants to execute: {event.tool_call.command}") # Execute command, send result back elif event.type == "content": print(event.delta, end="")
6.3 Handling Long-Running Operations
With compaction and persistent containers, operations can span multiple requests and even multiple sessions. Implement robust state management:
# Start a long-running task session = client.responses.create( model="gpt-5.1-codex-max", input=[{"role": "user", "content": "Analyze this codebase and identify optimization opportunities."}], compaction={"enabled": True, "threshold": 50000}, container={"persistent": True} ) session_id = session.container_id # Later (minutes or hours later), continue the work continuation = client.responses.create( model="gpt-5.1-codex-max", input=[{"role": "user", "content": "Continue the analysis. Focus on database queries."}], container_id=session_id # Resume the same container )
7. Personalization and Tone Control
7.1 Personality System Prompt Updates
January 2026 brought significant updates to GPT-5.2 Instant’s default personality, making it more conversational and contextually adaptive . Users can now select from base styles and fine-tune characteristics.
Programmatic tone control:
response = client.chat.completions.create( model="gpt-5.2-instant", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Recommend a good book."} ], tone_preset="friendly", # Options: friendly, professional, candid, quirky, etc. style_parameters={ "warmth": 0.8, # 0-1 scale "enthusiasm": 0.6, # 0-1 scale "emoji_usage": 0.3, # 0-1 scale "headers_and_lists": 0.5 # Preference for structured output } )
7.2 Characteristic Controls
The December 2025 update introduced granular controls for specific characteristics :
-
Warmth: How caring and empathetic the tone feels
-
Enthusiasm: Level of energy and excitement
-
Headers & Lists: Preference for structured formatting
-
Emoji Usage: Frequency of emoji in responses
These can be adjusted independently to create a custom personality that matches your brand voice.
7.3 Memory and Personalization
When reference chat history is enabled, ChatGPT API can now more reliably find specific details from past conversations. Retrieved information appears with source attribution .
# In API, you can provide conversation history explicitly response = client.chat.completions.create( model="gpt-5.2-instant", messages=conversation_history + [{"role": "user", "content": "What did we discuss last time?"}] )
8. Safety, Alignment, and Responsible Use
8.1 The Model Spec Evolution
OpenAI’s Model Spec, a living document outlining intended model behavior, has undergone significant updates to address emerging challenges .
Under-18 (U18) Principles (December 2025): Age-appropriate guidance for teen users, including:
-
Clearer boundaries and reduced exposure to potentially harmful content
-
Refusal to participate in self-harm, sexualized roleplay, dangerous activities
-
Encouragement of real-world support from trusted adults
Mental Health and Well-Being Guidance (October 2025): Extended to cover signs of delusions and mania. The model now responds safely and empathetically to distress without reinforcing harmful ideas .
Respect Real-World Ties: New principles discourage language that could contribute to isolation or emotional reliance on the assistant, even when users perceive the AI as a companion .
8.2 Age Prediction and Safeguards
OpenAI has rolled out an age prediction model that helps determine whether an account likely belongs to someone under 18, enabling appropriate safeguards . The model considers behavioral and account-level signals.
If incorrectly classified, users can verify their age through the Persona service .
8.3 Health in ChatGPT
The January 2026 introduction of “Health” creates a dedicated space for health and wellness conversations . Key features:
-
Secure connection to medical records, Apple Health, and supported wellness apps
-
Answers grounded in user’s own data
-
Conversations and data kept separate from general chats
-
Not used for foundation model training
Health conversations have enhanced privacy protections and are designed to help users navigate medical care, not replace it .
8.4 Content Moderation and Refusal Style
The September 2025 Model Spec update changed the terminology from “refusal” to “safe completion,” reflecting a more helpful and transparent approach to safety boundaries . Rather than simply refusing, the model now provides context about why certain requests cannot be fulfilled and suggests alternatives when appropriate.
8.5 Agentic Safety Principles
With models now capable of taking actions in the world, new safety principles apply :
-
Act within agreed scope: Like a consultant with a Scope of Work, the agent acts only with explicit or implicit user agreement
-
Control side effects: Minimize irreversible actions, prefer reversible approaches
-
Communicate impacts: Disclose actions and their consequences
9. Testing and Quality Assurance
9.1 Model Migration Testing
The retirement of GPT-4o serves as a case study in model lifecycle management. When migrating between model versions, comprehensive testing should cover :
Pre-Migration:
-
Curate golden prompts covering all business scenarios
-
Establish baseline outputs and acceptable variation ranges
-
Run offline A/B comparisons between old and new models
-
Identify unacceptable changes (e.g., increased refusal rates, hallucination spikes)
During Migration:
-
Gradual rollout starting with low-risk requests
-
Monitor key metrics: failure rate, refusal rate, user follow-up rate
-
Collect and analyze edge cases
Post-Migration:
-
Regular regression testing
-
Audit trail for prompt and configuration changes
-
User communication about behavior changes
9.2 Behavioral Testing
Testing should focus on behavior, not just exact outputs :
def test_model_behavior(test_cases): for case in test_cases: response = client.chat.completions.create( model="gpt-5.2-instant", messages=[{"role": "user", "content": case.prompt}] ).choices[0].message.content # Check semantic similarity, not exact match assert semantic_similarity(response, case.expected) > 0.8 # Verify key information is present for required_info in case.required_facts: assert required_info in response # Ensure no forbidden patterns for forbidden in case.forbidden_patterns: assert forbidden not in response
10. Pricing and Cost Optimization
10.1 Current Pricing Structure
Pricing remains token-based, with variations by model:
| Model | Input ($/1M tokens) | Output ($/1M tokens) | Notes |
|---|---|---|---|
| GPT-5.2 Instant | 2.50 | 10.00 | Most cost-effective for general use |
| GPT-5.2 Thinking | 15.00 | 60.00 | Higher cost for deep reasoning |
| GPT-5.3-Codex | 12.00 | 48.00 | Optimized for coding |
| GPT-5.1-Codex-Max | 20.00 | 80.00 | Project-scale work |
| GPT-5-Codex-Mini | 3.00 | 12.00 | 4x more usage within limits |
10.2 Cost Optimization Strategies
-
Use compaction: Server-side compaction reduces token usage for long-running conversations
-
Leverage caching: Persistent containers maintain state without repeated context
-
Choose appropriate thinking effort: Use “light” or “standard” for most queries, “extended” only when necessary
-
Monitor usage: Set up alerts for unexpected spending patterns
11. Real-World Applications
11.1 Autonomous Coding Agents
With GPT-5.3-Codex and the hosted container infrastructure, you can build coding agents that:
-
Clone repositories and analyze codebases
-
Run tests and debug failures
-
Generate and apply patches
-
Push changes to version control
agent = CodingAgent( model="gpt-5.3-codex", container={ "type": "managed", "persistent": True, "skills": ["code-review@2.1", "testing-framework@1.5"] } ) result = agent.run(""" Clone the repository, fix the failing tests in the auth module, and create a pull request with the changes. """)
11.2 Data Analysis Pipelines
Combine code interpreter, shell tools, and compaction for complex data workflows:
analysis = client.responses.create( model="gpt-5.3-codex", input=[{"role": "user", "content": """ Import the sales data, clean it, perform regression analysis, generate visualizations, and create a summary report. """}], tools=[ {"type": "code_interpreter"}, {"type": "shell", "container": {"type": "managed"}} ], compaction={"enabled": True, "threshold": 50000} )
11.3 Research Assistants
Deep research capabilities enable sophisticated information gathering:
research = client.research.create( model="gpt-5.2", query="Compare renewable energy adoption rates across European countries", sources={"include": ["europa.eu", "irena.org", "iea.org"]}, output_format="markdown", depth="comprehensive" # Hours of research compressed into minutes )
11.4 Personalized Education
With tone controls and memory, build adaptive tutoring systems:
tutor = client.chat.completions.create( model="gpt-5.2-instant", messages=[ {"role": "system", "content": "You are a patient math tutor for a 10th-grade student."}, {"role": "user", "content": "I don't understand quadratic equations."} ], tone_preset="friendly", style_parameters={"warmth": 0.9, "enthusiasm": 0.7} )
12. Future Directions ChatGPT API
The pace of innovation shows no signs of slowing. Based on recent trends, we can anticipate:
-
Tighter integration between model capabilities and execution environments
-
More sophisticated memory management with hierarchical compaction strategies
-
Expanded Skills ecosystem with interoperable agent capabilities
-
Enhanced multimodal understanding beyond text and images
-
Continued refinement of safety and alignment mechanisms
For developers, the key to success lies not just in mastering current APIs but in building flexible architectures that can adapt to continuous evolution. The models will keep changing, but the principles of thoughtful design, comprehensive testing, and responsible deployment will remain constant.

