{"id":53705,"date":"2025-08-24T15:52:19","date_gmt":"2025-08-24T05:52:19","guid":{"rendered":"https:\/\/www.cloudproinc.com.au\/?p=53705"},"modified":"2025-09-03T09:18:39","modified_gmt":"2025-09-02T23:18:39","slug":"how-to-use-the-tiktoken-tokenizer","status":"publish","type":"post","link":"https:\/\/cloudproinc.com.au\/index.php\/2025\/08\/24\/how-to-use-the-tiktoken-tokenizer\/","title":{"rendered":"How to Use the tiktoken Tokenizer"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">In this article, we\u2019ll explore <strong>How to Use the tiktoken Tokenizer<\/strong>, why it matters, and practical ways you can apply it in your projects to better control prompts, estimate API costs, and optimize large text inputs.<\/p>\n\n\n\n<!--more-->\n\n\n\n<div class=\"wp-block-yoast-seo-table-of-contents yoast-table-of-contents\"><h2>Table of contents<\/h2><ul><li><a href=\"#h-why-tokenization-matters\" data-level=\"2\">Why Tokenization Matters<\/a><\/li><li><a href=\"#h-installing-tiktoken\" data-level=\"2\">Installing tiktoken<\/a><\/li><li><a href=\"#h-basic-usage\" data-level=\"2\">Basic Usage<\/a><\/li><li><a href=\"#h-decoding-tokens-back-to-text\" data-level=\"2\">Decoding Tokens Back to Text<\/a><\/li><li><a href=\"#h-counting-tokens\" data-level=\"2\">Counting Tokens<\/a><\/li><li><a href=\"#h-using-model-specific-encodings\" data-level=\"2\">Using Model-Specific Encodings<\/a><\/li><li><a href=\"#h-working-with-long-texts\" data-level=\"2\">Working with Long Texts<\/a><\/li><li><a href=\"#h-estimating-api-costs-with-tiktoken\" data-level=\"2\">Estimating API Costs with tiktoken<\/a><\/li><li><a href=\"#h-advanced-features\" data-level=\"2\">Advanced Features<\/a><\/li><li><a href=\"#h-best-practices\" data-level=\"2\">Best Practices<\/a><\/li><li><a href=\"#h-conclusion\" data-level=\"2\">Conclusion<\/a><\/li><\/ul><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">When working with large language models (LLMs) like GPT-4, GPT-4o, or GPT-3.5, understanding tokens is crucial. Tokens are the basic units of text that models process\u2014think of them as word pieces rather than whole words. For example, the word <em>tokenization<\/em> might split into multiple tokens, while short words like <em>cat<\/em> are typically a single token.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The <strong><code>tiktoken<\/code><\/strong> library, developed by OpenAI, is a fast and efficient tokenizer that allows developers to understand, count, and manage tokens when interacting with LLMs. Whether you\u2019re optimizing prompts, estimating costs, or debugging issues, <code>tiktoken<\/code> is the tool that helps you stay in control.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1024\" height=\"683\" data-src=\"\/wp-content\/uploads\/2025\/08\/image-12-1024x683.png\" alt=\"\" class=\"wp-image-53706 lazyload\" data-srcset=\"\/wp-content\/uploads\/2025\/08\/image-12-1024x683.png 1024w, \/wp-content\/uploads\/2025\/08\/image-12-300x200.png 300w, \/wp-content\/uploads\/2025\/08\/image-12-768x512.png 768w, \/wp-content\/uploads\/2025\/08\/image-12-1080x720.png 1080w, \/wp-content\/uploads\/2025\/08\/image-12-1280x853.png 1280w, \/wp-content\/uploads\/2025\/08\/image-12-980x653.png 980w, \/wp-content\/uploads\/2025\/08\/image-12-480x320.png 480w, \/wp-content\/uploads\/2025\/08\/image-12.png 1536w\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 1024px; --smush-placeholder-aspect-ratio: 1024\/683;\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">In this post, we\u2019ll cover what <code>tiktoken<\/code> is, why it matters, and how to use it step by step.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-why-tokenization-matters\">Why Tokenization Matters<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Before diving into code, let\u2019s quickly revisit why tokenization is important:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Model limits<\/strong>: Each model has a maximum context window (e.g., GPT-4o supports up to 128k tokens). Exceeding this limit will cause your request to fail.<\/li>\n\n\n\n<li><strong>Costs<\/strong>: Most APIs charge per 1,000 tokens. Knowing token counts helps estimate costs before you send requests.<\/li>\n\n\n\n<li><strong>Prompt engineering<\/strong>: Understanding how text is broken down helps you write more efficient prompts.<\/li>\n\n\n\n<li><strong>Debugging<\/strong>: When your input gets unexpectedly truncated or rejected, tokenization often explains why.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-installing-tiktoken\">Installing <code>tiktoken<\/code><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><code>tiktoken<\/code> can be installed directly from PyPI:<\/p>\n\n\n\n<pre class=\"wp-block-code has-white-color has-black-background-color has-text-color has-background has-link-color wp-elements-adf87e124b2866d401e5e5010e8654fb\"><code>pip install tiktoken<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">It\u2019s lightweight and has no complex dependencies, so installation is usually smooth.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-basic-usage\">Basic Usage<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The first step is to import the library and load an encoding. Encodings are model-specific, meaning that different models tokenize text differently. For example, GPT-3.5 and GPT-4 use the <code>cl100k_base<\/code> encoding.<\/p>\n\n\n\n<pre class=\"wp-block-code has-white-color has-black-background-color has-text-color has-background has-link-color wp-elements-18343b7ff6d01d7c36e0e512efd5ea69\"><code>import tiktoken\n\n# Load encoding for GPT-4 \/ GPT-3.5\nencoding = tiktoken.get_encoding(\"cl100k_base\")\n\n# Encode text\ntext = \"Hello, world! Tokenization with tiktoken is fast.\"\ntokens = encoding.encode(text)\nprint(tokens)\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The output will be a list of integers (token IDs). Each ID corresponds to a token in the model\u2019s vocabulary.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For example:<\/p>\n\n\n\n<pre class=\"wp-block-code has-white-color has-black-background-color has-text-color has-background has-link-color wp-elements-d156d36664e397f914a43594c74ea92a\"><code>&#91;9906, 11, 995, 0, 36308, 284, 11299, 374, 220, 220]<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-decoding-tokens-back-to-text\">Decoding Tokens Back to Text<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">If you want to check what those IDs represent, you can decode them back into strings:<\/p>\n\n\n\n<pre class=\"wp-block-code has-white-color has-black-background-color has-text-color has-background has-link-color wp-elements-7bae7966b6bec72d2b14c05c93e5e145\"><code>decoded_text = encoding.decode(tokens)\nprint(decoded_text)\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Output:<\/p>\n\n\n\n<pre class=\"wp-block-code has-white-color has-black-background-color has-text-color has-background has-link-color wp-elements-619f03c40ea80381c303be0c081ecc27\"><code>Hello, world! Tokenization with tiktoken is fast.<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">This round-trip process ensures you know exactly how text is represented internally.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-counting-tokens\">Counting Tokens<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">One of the most common use cases is simply counting how many tokens a string contains. This is especially useful when preparing prompts for the OpenAI API.<\/p>\n\n\n\n<pre class=\"wp-block-code has-white-color has-black-background-color has-text-color has-background has-link-color wp-elements-af5ba03ca989c86611651abaf2f23080\"><code>def count_tokens(text: str, encoding_name: str = \"cl100k_base\") -&gt; int:\n    encoding = tiktoken.get_encoding(encoding_name)\n    return len(encoding.encode(text))\n\nsample = \"Large language models are powerful, but token limits matter.\"\nprint(count_tokens(sample))  # Output: e.g., 11\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">This gives you precise control over text length and cost estimation.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-using-model-specific-encodings\">Using Model-Specific Encodings<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Instead of manually specifying <code>cl100k_base<\/code>, you can directly load the encoding for a specific model. <code>tiktoken<\/code> has utilities to handle this:<\/p>\n\n\n\n<pre class=\"wp-block-code has-white-color has-black-background-color has-text-color has-background has-link-color wp-elements-ddb577fe0b92ffe7cd76e1ff28605b21\"><code>encoding = tiktoken.encoding_for_model(\"gpt-4o\")\ntext = \"This text will be tokenized using GPT-4o's encoding.\"\ntokens = encoding.encode(text)\nprint(tokens)\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">This is safer than hardcoding, since OpenAI occasionally updates encoding schemes for newer models.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-working-with-long-texts\">Working with Long Texts<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">When dealing with long documents, you may want to chunk text into smaller pieces to fit within model limits.<\/p>\n\n\n\n<pre class=\"wp-block-code has-white-color has-black-background-color has-text-color has-background has-link-color wp-elements-02ea7fba4374a6683765821f0e6bb538\"><code>def chunk_text(text: str, max_tokens: int = 500):\n    encoding = tiktoken.encoding_for_model(\"gpt-4o\")\n    tokens = encoding.encode(text)\n    \n    # Split into chunks\n    for i in range(0, len(tokens), max_tokens):\n        yield encoding.decode(tokens&#91;i:i+max_tokens])\n\n# Example usage\nfor chunk in chunk_text(\"Your very long document goes here...\", max_tokens=1000):\n    print(chunk)\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">This ensures that each API call stays within the model\u2019s context window.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-estimating-api-costs-with-tiktoken\">Estimating API Costs with <code>tiktoken<\/code><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Let\u2019s say you\u2019re using GPT-4o, and the pricing is $0.002 per 1,000 input tokens. You can estimate costs like this:<\/p>\n\n\n\n<pre class=\"wp-block-code has-white-color has-black-background-color has-text-color has-background has-link-color wp-elements-683687a53e1da1991cbcdeb172850b83\"><code>def estimate_cost(text: str, model: str = \"gpt-4o\", rate_per_1k: float = 0.002):\n    encoding = tiktoken.encoding_for_model(model)\n    num_tokens = len(encoding.encode(text))\n    cost = (num_tokens \/ 1000) * rate_per_1k\n    return num_tokens, cost\n\nsample = \"OpenAI models are great for summarization, Q&amp;A, and much more.\"\ntokens, cost = estimate_cost(sample)\nprint(f\"Tokens: {tokens}, Estimated cost: ${cost:.6f}\")\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Output example:<\/p>\n\n\n\n<pre class=\"wp-block-code has-white-color has-black-background-color has-text-color has-background has-link-color wp-elements-9b6081c144983a7ba2e6cc0871e6e2fb\"><code>Tokens: 14, Estimated cost: $0.000028<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-advanced-features\">Advanced Features<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Byte pair encoding (BPE)<\/strong>: <code>tiktoken<\/code> relies on BPE, a subword tokenization method. You can explore how words break down into pieces.<\/li>\n\n\n\n<li><strong>Custom encodings<\/strong>: It\u2019s possible to load custom encoding files if you\u2019re experimenting with fine-tuned models.<\/li>\n\n\n\n<li><strong>Debugging hidden tokens<\/strong>: <code>tiktoken<\/code> helps identify newline characters, spaces, and special tokens (like <code>END<\/code> markers).<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-best-practices\">Best Practices<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Always check tokens before API calls<\/strong> \u2013 prevent errors by validating input size.<\/li>\n\n\n\n<li><strong>Chunk smartly<\/strong> \u2013 break large texts at logical boundaries (sentences, paragraphs).<\/li>\n\n\n\n<li><strong>Cache encodings<\/strong> \u2013 loading encodings repeatedly can be slow; reuse them across calls.<\/li>\n\n\n\n<li><strong>Mind the prompt + response<\/strong> \u2013 remember that output tokens also count toward limits and costs.<\/li>\n\n\n\n<li><strong>Stay model-specific<\/strong> \u2013 different models tokenize differently, so always pick the right encoding.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-conclusion\">Conclusion<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The <code>tiktoken<\/code> library is an essential tool for anyone working with OpenAI\u2019s models. By understanding how text is tokenized, you gain the ability to optimize prompts, estimate costs, and avoid hitting model limits. From simple token counting to advanced text chunking strategies, <code>tiktoken<\/code> helps you take full control of how your text is processed.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If you\u2019re building applications with GPT models, learning to use <code>tiktoken<\/code> is not optional\u2014it\u2019s a must. With just a few lines of Python, you can gain deep insight into how your prompts are represented under the hood.<\/p>\n\n\n\n<ul class=\"wp-block-yoast-seo-related-links yoast-seo-related-links\">\n<li><a href=\"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/07\/29\/counting-tokens-using-the-openai-python-sdk\/\">Counting Tokens Using the OpenAI Python SDK<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.cloudproinc.com.au\/index.php\/2022\/02\/02\/5-benefits-of-using-microsoft-intune-in-your-business\/\">5 Benefits of Using Microsoft Intune in Your Business<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/08\/06\/how-to-code-and-build-a-gpt-large-language-model\/\">How to Code and Build a GPT Large Language Model<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/cloudproinc.azurewebsites.net\/index.php\/2024\/10\/07\/increase-phpmyadmin-upload-size-limit-on-azure-web-app\/\">Increase PHPMyAdmin Upload Size Limit on Azure Web App<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/04\/29\/how-to-protect-your-openai-net-apps-from-prompt-injection-attacks-with-azure-ai-foundry\/\">Protect Your OpenAI .NET Apps from Prompt Injection Attacks<\/a><\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this article, we\u2019ll explore How to Use the tiktoken Tokenizer, why it matters, and practical ways you can apply it in your projects to better control prompts, estimate API costs, and optimize large text inputs.<\/p>\n","protected":false},"author":1,"featured_media":53708,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_yoast_wpseo_opengraph-title":"","_yoast_wpseo_opengraph-description":"","_yoast_wpseo_twitter-title":"","_yoast_wpseo_twitter-description":"","_et_pb_use_builder":"off","_et_pb_old_content":"","_et_gb_content_width":"","_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[24,13,53,84],"tags":[],"class_list":["post-53705","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-blog","category-openai","category-tiktoken"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.3 (Yoast SEO v28.0) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>How to Use the tiktoken Tokenizer - CPI Consulting<\/title>\n<meta name=\"description\" content=\"Learn how to use the tiktoken tokenizer to effectively manage tokens in large language models for better prompt control.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/08\/24\/how-to-use-the-tiktoken-tokenizer\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to Use the tiktoken Tokenizer\" \/>\n<meta property=\"og:description\" content=\"Learn how to use the tiktoken tokenizer to effectively manage tokens in large language models for better prompt control.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/08\/24\/how-to-use-the-tiktoken-tokenizer\/\" \/>\n<meta property=\"og:site_name\" content=\"CPI Consulting\" \/>\n<meta property=\"article:published_time\" content=\"2025-08-24T05:52:19+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-09-02T23:18:39+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/cloudproinc.com.au\/wp-content\/uploads\/2025\/08\/how-to-use-the-tiktoken-tokenizer.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1536\" \/>\n\t<meta property=\"og:image:height\" content=\"1024\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"CPI Staff\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"CPI Staff\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.cloudproinc.com.au\\\/index.php\\\/2025\\\/08\\\/24\\\/how-to-use-the-tiktoken-tokenizer\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.cloudproinc.com.au\\\/index.php\\\/2025\\\/08\\\/24\\\/how-to-use-the-tiktoken-tokenizer\\\/\"},\"author\":{\"name\":\"CPI Staff\",\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/#\\\/schema\\\/person\\\/192eeeb0ce91062126ce3822ae88fe6e\"},\"headline\":\"How to Use the tiktoken Tokenizer\",\"datePublished\":\"2025-08-24T05:52:19+00:00\",\"dateModified\":\"2025-09-02T23:18:39+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.cloudproinc.com.au\\\/index.php\\\/2025\\\/08\\\/24\\\/how-to-use-the-tiktoken-tokenizer\\\/\"},\"wordCount\":779,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.cloudproinc.com.au\\\/index.php\\\/2025\\\/08\\\/24\\\/how-to-use-the-tiktoken-tokenizer\\\/#primaryimage\"},\"thumbnailUrl\":\"\\\/wp-content\\\/uploads\\\/2025\\\/08\\\/how-to-use-the-tiktoken-tokenizer.png\",\"articleSection\":[\"AI\",\"Blog\",\"OpenAI\",\"Tiktoken\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.cloudproinc.com.au\\\/index.php\\\/2025\\\/08\\\/24\\\/how-to-use-the-tiktoken-tokenizer\\\/#respond\"]}],\"accessibilityFeature\":[\"tableOfContents\"]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.cloudproinc.com.au\\\/index.php\\\/2025\\\/08\\\/24\\\/how-to-use-the-tiktoken-tokenizer\\\/\",\"url\":\"https:\\\/\\\/www.cloudproinc.com.au\\\/index.php\\\/2025\\\/08\\\/24\\\/how-to-use-the-tiktoken-tokenizer\\\/\",\"name\":\"How to Use the tiktoken Tokenizer - CPI Consulting\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.cloudproinc.com.au\\\/index.php\\\/2025\\\/08\\\/24\\\/how-to-use-the-tiktoken-tokenizer\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.cloudproinc.com.au\\\/index.php\\\/2025\\\/08\\\/24\\\/how-to-use-the-tiktoken-tokenizer\\\/#primaryimage\"},\"thumbnailUrl\":\"\\\/wp-content\\\/uploads\\\/2025\\\/08\\\/how-to-use-the-tiktoken-tokenizer.png\",\"datePublished\":\"2025-08-24T05:52:19+00:00\",\"dateModified\":\"2025-09-02T23:18:39+00:00\",\"description\":\"Learn how to use the tiktoken tokenizer to effectively manage tokens in large language models for better prompt control.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.cloudproinc.com.au\\\/index.php\\\/2025\\\/08\\\/24\\\/how-to-use-the-tiktoken-tokenizer\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.cloudproinc.com.au\\\/index.php\\\/2025\\\/08\\\/24\\\/how-to-use-the-tiktoken-tokenizer\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.cloudproinc.com.au\\\/index.php\\\/2025\\\/08\\\/24\\\/how-to-use-the-tiktoken-tokenizer\\\/#primaryimage\",\"url\":\"\\\/wp-content\\\/uploads\\\/2025\\\/08\\\/how-to-use-the-tiktoken-tokenizer.png\",\"contentUrl\":\"\\\/wp-content\\\/uploads\\\/2025\\\/08\\\/how-to-use-the-tiktoken-tokenizer.png\",\"width\":1536,\"height\":1024},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.cloudproinc.com.au\\\/index.php\\\/2025\\\/08\\\/24\\\/how-to-use-the-tiktoken-tokenizer\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.cloudproinc.com.au\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to Use the tiktoken Tokenizer\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/#website\",\"url\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/\",\"name\":\"Cloud Pro Inc - CPI Consulting Pty Ltd\",\"description\":\"Cloud, AI &amp; Cybersecurity Consulting | Melbourne\",\"publisher\":{\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/#organization\",\"name\":\"Cloud Pro Inc - Cloud Pro Inc - CPI Consulting Pty Ltd\",\"url\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"\\\/wp-content\\\/uploads\\\/2022\\\/01\\\/favfinalfile.png\",\"contentUrl\":\"\\\/wp-content\\\/uploads\\\/2022\\\/01\\\/favfinalfile.png\",\"width\":500,\"height\":500,\"caption\":\"Cloud Pro Inc - Cloud Pro Inc - CPI Consulting Pty Ltd\"},\"image\":{\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/#\\\/schema\\\/logo\\\/image\\\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/#\\\/schema\\\/person\\\/192eeeb0ce91062126ce3822ae88fe6e\",\"name\":\"CPI Staff\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/2d96eeb53b791d92c8c50dd667e3beec92c93253bb6ff21c02cfa8ca73665c70?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/2d96eeb53b791d92c8c50dd667e3beec92c93253bb6ff21c02cfa8ca73665c70?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/2d96eeb53b791d92c8c50dd667e3beec92c93253bb6ff21c02cfa8ca73665c70?s=96&d=mm&r=g\",\"caption\":\"CPI Staff\"},\"sameAs\":[\"http:\\\/\\\/www.cloudproinc.com.au\"],\"url\":\"https:\\\/\\\/cloudproinc.com.au\\\/index.php\\\/author\\\/cpiadmin\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"How to Use the tiktoken Tokenizer - CPI Consulting","description":"Learn how to use the tiktoken tokenizer to effectively manage tokens in large language models for better prompt control.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/08\/24\/how-to-use-the-tiktoken-tokenizer\/","og_locale":"en_US","og_type":"article","og_title":"How to Use the tiktoken Tokenizer","og_description":"Learn how to use the tiktoken tokenizer to effectively manage tokens in large language models for better prompt control.","og_url":"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/08\/24\/how-to-use-the-tiktoken-tokenizer\/","og_site_name":"CPI Consulting","article_published_time":"2025-08-24T05:52:19+00:00","article_modified_time":"2025-09-02T23:18:39+00:00","og_image":[{"width":1536,"height":1024,"url":"https:\/\/cloudproinc.com.au\/wp-content\/uploads\/2025\/08\/how-to-use-the-tiktoken-tokenizer.png","type":"image\/png"}],"author":"CPI Staff","twitter_card":"summary_large_image","twitter_misc":{"Written by":"CPI Staff","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/08\/24\/how-to-use-the-tiktoken-tokenizer\/#article","isPartOf":{"@id":"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/08\/24\/how-to-use-the-tiktoken-tokenizer\/"},"author":{"name":"CPI Staff","@id":"https:\/\/cloudproinc.azurewebsites.net\/#\/schema\/person\/192eeeb0ce91062126ce3822ae88fe6e"},"headline":"How to Use the tiktoken Tokenizer","datePublished":"2025-08-24T05:52:19+00:00","dateModified":"2025-09-02T23:18:39+00:00","mainEntityOfPage":{"@id":"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/08\/24\/how-to-use-the-tiktoken-tokenizer\/"},"wordCount":779,"commentCount":0,"publisher":{"@id":"https:\/\/cloudproinc.azurewebsites.net\/#organization"},"image":{"@id":"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/08\/24\/how-to-use-the-tiktoken-tokenizer\/#primaryimage"},"thumbnailUrl":"\/wp-content\/uploads\/2025\/08\/how-to-use-the-tiktoken-tokenizer.png","articleSection":["AI","Blog","OpenAI","Tiktoken"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.cloudproinc.com.au\/index.php\/2025\/08\/24\/how-to-use-the-tiktoken-tokenizer\/#respond"]}],"accessibilityFeature":["tableOfContents"]},{"@type":"WebPage","@id":"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/08\/24\/how-to-use-the-tiktoken-tokenizer\/","url":"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/08\/24\/how-to-use-the-tiktoken-tokenizer\/","name":"How to Use the tiktoken Tokenizer - CPI Consulting","isPartOf":{"@id":"https:\/\/cloudproinc.azurewebsites.net\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/08\/24\/how-to-use-the-tiktoken-tokenizer\/#primaryimage"},"image":{"@id":"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/08\/24\/how-to-use-the-tiktoken-tokenizer\/#primaryimage"},"thumbnailUrl":"\/wp-content\/uploads\/2025\/08\/how-to-use-the-tiktoken-tokenizer.png","datePublished":"2025-08-24T05:52:19+00:00","dateModified":"2025-09-02T23:18:39+00:00","description":"Learn how to use the tiktoken tokenizer to effectively manage tokens in large language models for better prompt control.","breadcrumb":{"@id":"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/08\/24\/how-to-use-the-tiktoken-tokenizer\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.cloudproinc.com.au\/index.php\/2025\/08\/24\/how-to-use-the-tiktoken-tokenizer\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/08\/24\/how-to-use-the-tiktoken-tokenizer\/#primaryimage","url":"\/wp-content\/uploads\/2025\/08\/how-to-use-the-tiktoken-tokenizer.png","contentUrl":"\/wp-content\/uploads\/2025\/08\/how-to-use-the-tiktoken-tokenizer.png","width":1536,"height":1024},{"@type":"BreadcrumbList","@id":"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/08\/24\/how-to-use-the-tiktoken-tokenizer\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.cloudproinc.com.au\/"},{"@type":"ListItem","position":2,"name":"How to Use the tiktoken Tokenizer"}]},{"@type":"WebSite","@id":"https:\/\/cloudproinc.azurewebsites.net\/#website","url":"https:\/\/cloudproinc.azurewebsites.net\/","name":"Cloud Pro Inc - CPI Consulting Pty Ltd","description":"Cloud, AI &amp; Cybersecurity Consulting | Melbourne","publisher":{"@id":"https:\/\/cloudproinc.azurewebsites.net\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/cloudproinc.azurewebsites.net\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/cloudproinc.azurewebsites.net\/#organization","name":"Cloud Pro Inc - Cloud Pro Inc - CPI Consulting Pty Ltd","url":"https:\/\/cloudproinc.azurewebsites.net\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/cloudproinc.azurewebsites.net\/#\/schema\/logo\/image\/","url":"\/wp-content\/uploads\/2022\/01\/favfinalfile.png","contentUrl":"\/wp-content\/uploads\/2022\/01\/favfinalfile.png","width":500,"height":500,"caption":"Cloud Pro Inc - Cloud Pro Inc - CPI Consulting Pty Ltd"},"image":{"@id":"https:\/\/cloudproinc.azurewebsites.net\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/cloudproinc.azurewebsites.net\/#\/schema\/person\/192eeeb0ce91062126ce3822ae88fe6e","name":"CPI Staff","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/2d96eeb53b791d92c8c50dd667e3beec92c93253bb6ff21c02cfa8ca73665c70?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/2d96eeb53b791d92c8c50dd667e3beec92c93253bb6ff21c02cfa8ca73665c70?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/2d96eeb53b791d92c8c50dd667e3beec92c93253bb6ff21c02cfa8ca73665c70?s=96&d=mm&r=g","caption":"CPI Staff"},"sameAs":["http:\/\/www.cloudproinc.com.au"],"url":"https:\/\/cloudproinc.com.au\/index.php\/author\/cpiadmin\/"}]}},"jetpack_featured_media_url":"\/wp-content\/uploads\/2025\/08\/how-to-use-the-tiktoken-tokenizer.png","jetpack-related-posts":[{"id":53774,"url":"https:\/\/cloudproinc.com.au\/index.php\/2025\/09\/03\/integrate-tiktoken-in-python-applications\/","url_meta":{"origin":53705,"position":0},"title":"Integrate Tiktoken in Python Applications","author":"CPI Staff","date":"September 3, 2025","format":false,"excerpt":"Learn what Tiktoken is and how to use it in Python to count tokens, budget prompts, and chunk text with a practical, step-by-step example.","rel":"","context":"In &quot;AI&quot;","block_context":{"text":"AI","link":"https:\/\/cloudproinc.com.au\/index.php\/category\/ai\/"},"img":{"alt_text":"","src":"\/wp-content\/uploads\/2025\/09\/integrate-tiktoken-in-python-applications.png","width":350,"height":200,"srcset":"\/wp-content\/uploads\/2025\/09\/integrate-tiktoken-in-python-applications.png 1x, \/wp-content\/uploads\/2025\/09\/integrate-tiktoken-in-python-applications.png 1.5x, \/wp-content\/uploads\/2025\/09\/integrate-tiktoken-in-python-applications.png 2x, \/wp-content\/uploads\/2025\/09\/integrate-tiktoken-in-python-applications.png 3x, \/wp-content\/uploads\/2025\/09\/integrate-tiktoken-in-python-applications.png 4x"},"classes":[]},{"id":53573,"url":"https:\/\/cloudproinc.com.au\/index.php\/2025\/08\/06\/how-to-code-and-build-a-gpt-large-language-model\/","url_meta":{"origin":53705,"position":1},"title":"How to Code and Build a GPT Large Language Model","author":"CPI Staff","date":"August 6, 2025","format":false,"excerpt":"In this blog post, you\u2019ll learn how to code and build a GPT LLM from scratch or fine-tune an existing one. We\u2019ll cover the architecture, key tools, libraries, frameworks, and essential resources to get you started fast. Table of contentsUnderstanding GPT LLM ArchitectureModel Architecture DiagramTools and Libraries to Build a\u2026","rel":"","context":"In &quot;AI&quot;","block_context":{"text":"AI","link":"https:\/\/cloudproinc.com.au\/index.php\/category\/ai\/"},"img":{"alt_text":"","src":"\/wp-content\/uploads\/2025\/08\/CreateLLM.png","width":350,"height":200,"srcset":"\/wp-content\/uploads\/2025\/08\/CreateLLM.png 1x, \/wp-content\/uploads\/2025\/08\/CreateLLM.png 1.5x, \/wp-content\/uploads\/2025\/08\/CreateLLM.png 2x, \/wp-content\/uploads\/2025\/08\/CreateLLM.png 3x, \/wp-content\/uploads\/2025\/08\/CreateLLM.png 4x"},"classes":[]},{"id":53834,"url":"https:\/\/cloudproinc.com.au\/index.php\/2025\/09\/15\/how-text-chunking-works-for-rag-pipelines\/","url_meta":{"origin":53705,"position":2},"title":"How Text Chunking Works for RAG Pipelines","author":"CPI Staff","date":"September 15, 2025","format":false,"excerpt":"A practical guide to text chunking for RAG and search. Learn strategies, token sizes, overlap, and code to lift retrieval quality without inflating cost or latency.","rel":"","context":"In &quot;Blog&quot;","block_context":{"text":"Blog","link":"https:\/\/cloudproinc.com.au\/index.php\/category\/blog\/"},"img":{"alt_text":"","src":"\/wp-content\/uploads\/2025\/09\/how-text-chunking-works-for-rag-pipelines-and-search-quality.png","width":350,"height":200,"srcset":"\/wp-content\/uploads\/2025\/09\/how-text-chunking-works-for-rag-pipelines-and-search-quality.png 1x, \/wp-content\/uploads\/2025\/09\/how-text-chunking-works-for-rag-pipelines-and-search-quality.png 1.5x, \/wp-content\/uploads\/2025\/09\/how-text-chunking-works-for-rag-pipelines-and-search-quality.png 2x, \/wp-content\/uploads\/2025\/09\/how-text-chunking-works-for-rag-pipelines-and-search-quality.png 3x, \/wp-content\/uploads\/2025\/09\/how-text-chunking-works-for-rag-pipelines-and-search-quality.png 4x"},"classes":[]},{"id":53555,"url":"https:\/\/cloudproinc.com.au\/index.php\/2025\/07\/29\/counting-tokens-using-the-openai-python-sdk\/","url_meta":{"origin":53705,"position":3},"title":"Counting Tokens Using the OpenAI Python SDK","author":"CPI Staff","date":"July 29, 2025","format":false,"excerpt":"This post provides a comprehensive guide on counting tokens using the OpenAI Python SDK, covering Python virtual environments, managing your OpenAI API key securely, and the role of the requirements.txt file. In the world of Large Language Models (LLMs) and Artificial Intelligence (AI), the term \"token\" frequently arises. Tokens are\u2026","rel":"","context":"In &quot;AI&quot;","block_context":{"text":"AI","link":"https:\/\/cloudproinc.com.au\/index.php\/category\/ai\/"},"img":{"alt_text":"","src":"\/wp-content\/uploads\/2025\/07\/image-23.png","width":350,"height":200,"srcset":"\/wp-content\/uploads\/2025\/07\/image-23.png 1x, \/wp-content\/uploads\/2025\/07\/image-23.png 1.5x, \/wp-content\/uploads\/2025\/07\/image-23.png 2x"},"classes":[]},{"id":53864,"url":"https:\/\/cloudproinc.com.au\/index.php\/2025\/09\/15\/preparing-input-text-for-training-llms\/","url_meta":{"origin":53705,"position":4},"title":"Preparing Input Text for Training LLMs","author":"CPI Staff","date":"September 15, 2025","format":false,"excerpt":"Practical steps to clean, normalize, chunk, and structure text for training and fine-tuning LLMs, with clear explanations and runnable code.","rel":"","context":"In &quot;AI&quot;","block_context":{"text":"AI","link":"https:\/\/cloudproinc.com.au\/index.php\/category\/ai\/"},"img":{"alt_text":"","src":"\/wp-content\/uploads\/2025\/09\/preparing-input-text-for-training-llms-that-perform-in-production.png","width":350,"height":200,"srcset":"\/wp-content\/uploads\/2025\/09\/preparing-input-text-for-training-llms-that-perform-in-production.png 1x, \/wp-content\/uploads\/2025\/09\/preparing-input-text-for-training-llms-that-perform-in-production.png 1.5x, \/wp-content\/uploads\/2025\/09\/preparing-input-text-for-training-llms-that-perform-in-production.png 2x, \/wp-content\/uploads\/2025\/09\/preparing-input-text-for-training-llms-that-perform-in-production.png 3x, \/wp-content\/uploads\/2025\/09\/preparing-input-text-for-training-llms-that-perform-in-production.png 4x"},"classes":[]},{"id":53745,"url":"https:\/\/cloudproinc.com.au\/index.php\/2025\/08\/31\/understanding-openai-embedding-models\/","url_meta":{"origin":53705,"position":5},"title":"Understanding OpenAI Embedding Models","author":"CPI Staff","date":"August 31, 2025","format":false,"excerpt":"A practical guide to OpenAI\u2019s embedding models\u2014what they are, how they work, and how to use them for search, RAG, clustering, and more.","rel":"","context":"In &quot;AI&quot;","block_context":{"text":"AI","link":"https:\/\/cloudproinc.com.au\/index.php\/category\/ai\/"},"img":{"alt_text":"","src":"\/wp-content\/uploads\/2025\/08\/understanding-openai-embedding-models.png","width":350,"height":200,"srcset":"\/wp-content\/uploads\/2025\/08\/understanding-openai-embedding-models.png 1x, \/wp-content\/uploads\/2025\/08\/understanding-openai-embedding-models.png 1.5x, \/wp-content\/uploads\/2025\/08\/understanding-openai-embedding-models.png 2x, \/wp-content\/uploads\/2025\/08\/understanding-openai-embedding-models.png 3x, \/wp-content\/uploads\/2025\/08\/understanding-openai-embedding-models.png 4x"},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/cloudproinc.com.au\/index.php\/wp-json\/wp\/v2\/posts\/53705","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cloudproinc.com.au\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cloudproinc.com.au\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cloudproinc.com.au\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cloudproinc.com.au\/index.php\/wp-json\/wp\/v2\/comments?post=53705"}],"version-history":[{"count":1,"href":"https:\/\/cloudproinc.com.au\/index.php\/wp-json\/wp\/v2\/posts\/53705\/revisions"}],"predecessor-version":[{"id":53707,"href":"https:\/\/cloudproinc.com.au\/index.php\/wp-json\/wp\/v2\/posts\/53705\/revisions\/53707"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cloudproinc.com.au\/index.php\/wp-json\/wp\/v2\/media\/53708"}],"wp:attachment":[{"href":"https:\/\/cloudproinc.com.au\/index.php\/wp-json\/wp\/v2\/media?parent=53705"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cloudproinc.com.au\/index.php\/wp-json\/wp\/v2\/categories?post=53705"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cloudproinc.com.au\/index.php\/wp-json\/wp\/v2\/tags?post=53705"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}