Ideogram 4.0 是一個 93 億參數的開源圖像模型,在文字渲染、bounding box 佈局控制和色彩調色盤方面表現出色。權重公開在 HuggingFace 上,推理程式碼在 GitHub,整個系統基於一套 JSON 提示詞格式——這在發布初期讓不少人一頭霧水。
這篇指南會講清楚這套 JSON 系統為什麼存在、每個欄位控制什麼,以及可以直接複製使用的各場景 prompt。
為什麼 Ideogram 4.0 用 JSON 提示詞?
大多數圖像模型的訓練資料是 (圖片, 文字描述) 配對。描述是自然語言句子,模型從中學習把詞語和視覺概念對應起來。這能用,但有歧義——當你寫「左邊有一輛紅色汽車,旁邊是藍色建築」時,模型得自己猜紅色屬於哪個物件、「左邊」到底是畫面哪個位置。
Ideogram 4.0 走了不同的路。根據官方文件,它完全用結構化 JSON 標註訓練。每張訓練圖片配對的 JSON 把場景描述、風格參數、每個元素的位置分開寫清楚。官方說法是訓練標註「刻意做到極其詳盡」——每個 JSON 都窮盡描述圖中一切。
這意味著模型不需要猜測空間關係。bounding box 座標直接映射到訓練資料中的精確位置,hex 色值也直接對應訓練時建立的顏色關聯,而不是對顏色詞的模糊理解。
實際結論:JSON 提示詞能解鎖純文字做不到的佈局、排版和色彩精度。但純文字也能用——Ideogram 的 Magic Prompt 會用 LLM 把你的隨意輸入轉成 JSON 再生圖。
純文字 vs JSON:實際差別有多大
同一個概念,兩種寫法:
純文字:
A jazz festival poster with bold typography, warm colors, and a saxophone silhouette
JSON:
{"high_level_description":"A vibrant jazz festival poster featuring bold typography and a saxophone silhouette against warm-toned geometric shapes","style_description":{"aesthetics":"retro, grain texture, bold contrast","lighting":"warm stage lighting with amber tones","medium":"graphic_design","art_style":"vintage concert poster with screen-print texture","color_palette":["#E8572A","#F2A03D","#1B1B2F","#F5E6CC","#C2185B"]},"compositional_deconstruction":{"background":"Deep navy blue with subtle radial gradient and halftone dot pattern","elements":[{"type":"text","bbox":[50,100,250,900],"text":"JAZZ\nFESTIVAL","desc":"Large bold sans-serif title in warm orange, slightly tilted 3 degrees clockwise"},{"type":"obj","bbox":[300,200,850,750],"desc":"Golden saxophone silhouette with geometric art deco fragmentation, pieces floating upward"},{"type":"text","bbox":[870,150,950,850],"text":"JUNE 28-30 • RIVERSIDE PARK • TICKETS AT JAZZFEST.COM","desc":"Small caps tracking-wide footer text in cream color"}]}}

純文字生成的爵士海報——佈局和顏色由模型自行決定

同一概念的 JSON 版本——精確調色盤、可控的文字位置、有意圖的構圖
純文字版本出來的圖可以用。JSON 版本出來的圖每個元素都在你指定的位置,用你指定的顏色。
JSON 結構速覽
完整標註有三個頂層欄位:
| 欄位 | 是否必需 | 控制什麼 |
|---|---|---|
high_level_description |
推薦 | 1-2 句整體描述 |
style_description |
可選 | 光照、媒介、美學風格、調色盤 |
compositional_deconstruction |
必需 | 背景場景 + 各元素及其位置 |
style_description 裡必須二選一:photo(相機/鏡頭參數)或 art_style(插畫/設計風格)——不能同時用。
每個元素分兩種類型:obj(視覺物件)或 text(畫面內文字)。bounding box 格式是 [y_min, x_min, y_max, x_max],歸一化到 0–1000 座標。調色盤全域最多 16 個 hex 色值,單個元素最多 5 個。
可複製的空白範本:
{"high_level_description":"[1-2 句概括]","style_description":{"aesthetics":"[風格關鍵詞]","lighting":"[光照設定]","medium":"[photograph|illustration|3d_render|painting|graphic_design]","art_style":"[風格描述——或者用 photo 替代此欄位]","color_palette":["#HEXCODE","#HEXCODE"]},"compositional_deconstruction":{"background":"[背景/環境描述]","elements":[{"type":"obj","bbox":[y_min,x_min,y_max,x_max],"desc":"[元素詳細描述]"},{"type":"text","bbox":[y_min,x_min,y_max,x_max],"text":"[要渲染的文字]","desc":"[文字樣式描述]"}]}}
各場景 Prompt 範例
活動海報
海報是 Ideogram 4.0 文字渲染的最佳展示。多個文字區塊放在精確位置,各有不同大小和風格。
{"high_level_description":"A minimalist tech conference poster with clean typography and geometric accents","style_description":{"aesthetics":"clean, modern, Swiss design influenced","lighting":"flat, even studio lighting","medium":"graphic_design","art_style":"minimalist poster design with strong grid structure","color_palette":["#0D0D0D","#FFFFFF","#4ECDC4","#FF6B6B"]},"compositional_deconstruction":{"background":"Pure white background with subtle 12-column grid lines in light gray","elements":[{"type":"text","bbox":[80,60,300,940],"text":"DEVCON\n2026","desc":"Ultra-bold grotesque typeface in black, massive size, tight leading"},{"type":"obj","bbox":[350,100,700,900],"desc":"Abstract geometric composition of overlapping circles and rectangles in teal and coral, suggesting network nodes and connections"},{"type":"text","bbox":[750,60,900,940],"text":"SEPTEMBER 15-17\nSAN FRANCISCO\nREGISTER AT DEVCON.IO","desc":"Light weight mono-spaced text in dark gray, left-aligned, generous line spacing"}]}}

網格佈局、精確文字定位、雙色點綴調色盤
產品攝影
切換到 photo 模式寫上相機參數。bounding box 控制產品位置和留白空間。
{"high_level_description":"A premium skincare bottle photographed on a marble surface with soft natural lighting","style_description":{"aesthetics":"editorial, clean, luxury","lighting":"soft diffused window light from upper left, subtle reflection on marble","photo":"85mm f/2.8, shallow depth of field, color-graded","medium":"photograph","color_palette":["#F7F3EE","#D4C5B2","#8B7355","#FFFFFF","#E8DDD3"]},"compositional_deconstruction":{"background":"Polished white marble surface with subtle gray veining, soft gradient to warm cream in the background","elements":[{"type":"obj","bbox":[150,300,850,700],"desc":"Tall frosted glass skincare bottle with minimal gold typography label, cap removed and placed beside the bottle, casting soft shadow to the right"}]}}

可控的光照方向、大理石質感、精心設計的留白
Logo 設計
Logo 需要純色和乾淨邊緣。用 art_style 實現向量風格,元素保持簡單。
{"high_level_description":"A modern geometric logo mark for a sustainable energy company","style_description":{"aesthetics":"minimal, geometric, professional","lighting":"flat, no shadows","medium":"graphic_design","art_style":"flat vector logo design, no gradients, clean edges","color_palette":["#2D5F2D","#4CAF50","#FFFFFF"]},"compositional_deconstruction":{"background":"Pure white, no texture","elements":[{"type":"obj","bbox":[200,200,800,800],"desc":"Abstract leaf shape formed by three overlapping chevrons pointing upward, creating a subtle upward arrow in the negative space, rendered in two shades of green"}]}}

向量風格 Logo——純色、無漸層、精準幾何
寫實攝影
寫實效果靠 photo 欄位裡的相機參數細節。
{"high_level_description":"A candid street photograph of a woman walking through a rain-soaked Tokyo alley at night","style_description":{"aesthetics":"cinematic, moody, high contrast","lighting":"neon signs reflecting off wet pavement, warm tungsten from shop interiors mixing with cool blue ambient","photo":"35mm f/1.4, shot wide open, rain droplets visible on lens edge, slight motion blur on passing figures","medium":"photograph","color_palette":["#1A1A2E","#E94560","#F5A623","#16213E","#0F3460"]},"compositional_deconstruction":{"background":"Narrow Tokyo back-alley at night, wet asphalt reflecting neon kanji signs, steam rising from a ramen shop vent on the left","elements":[{"type":"obj","bbox":[100,350,900,650],"desc":"Young woman in a dark trench coat holding a transparent umbrella, mid-stride, face partially lit by warm shop light, looking slightly to camera right"},{"type":"obj","bbox":[50,50,400,200],"desc":"Glowing red and pink neon sign with Japanese characters, slightly out of focus due to shallow depth of field"}]}}

電影感街頭攝影——霓虹反射、淺景深、冷暖光混合
社群媒體圖
社群圖通常需要醒目文字配品牌色。用寬 bounding box 放標題文字,元素數量控制少。
{"high_level_description":"An Instagram carousel cover slide announcing a product launch with bold headline and gradient background","style_description":{"aesthetics":"bold, contemporary, startup","lighting":"soft ambient, no harsh shadows","medium":"graphic_design","art_style":"modern social media graphic with rounded corners and soft gradients","color_palette":["#6C5CE7","#A29BFE","#FFFFFF","#DFE6E9","#2D3436"]},"compositional_deconstruction":{"background":"Smooth gradient from deep purple at top-left to soft lavender at bottom-right","elements":[{"type":"text","bbox":[150,100,450,900],"text":"SOMETHING\nBIG IS\nCOMING","desc":"Extra-bold sans-serif headline in white, left-aligned, stacked on three lines with tight leading"},{"type":"text","bbox":[550,100,650,900],"text":"JUNE 30 • 9AM PST • BE FIRST IN LINE","desc":"Medium weight text in light gray, same left alignment as headline"},{"type":"obj","bbox":[700,300,950,700],"desc":"Abstract 3D blob shape in frosted glass material with purple and pink internal refraction, floating with subtle shadow beneath"}]}}

Instagram 風格發布預告——漸層背景、堆疊標題、3D 裝飾元素
包裝設計
產品包裝需要精確的文字位置和品牌色一致性。
{"high_level_description":"A flat-lay photograph of artisan chocolate bar packaging on a dark slate surface","style_description":{"aesthetics":"artisan, premium, textured","lighting":"soft overhead light with slight directional warmth from the right","photo":"50mm f/4, even focus across the surface, high color fidelity","medium":"photograph","color_palette":["#2C1810","#D4A574","#F5E6D3","#1A1A1A","#8B6914"]},"compositional_deconstruction":{"background":"Dark charcoal slate surface with subtle texture, scattered cocoa nibs and gold foil fragments around the edges","elements":[{"type":"obj","bbox":[100,150,900,850],"desc":"Rectangular chocolate bar wrapper in matte cream paper with embossed cocoa pod illustration, partially unwrapped to reveal dark chocolate squares, gold foil inner wrapper visible at one end"}]}}

手工巧克力包裝平鋪——控制的表面質感、精確調色盤、高級質感
資訊圖表
資訊圖表組合多個文字區塊和視覺元素。沒有 bounding box 的話,文字重疊幾乎是必然的。
{"high_level_description":"A vertical infographic showing 4 steps of a morning routine with icons and numbered labels","style_description":{"aesthetics":"friendly, clean, informational","lighting":"flat, no shadows","medium":"graphic_design","art_style":"flat illustration style with rounded shapes and soft colors","color_palette":["#FF9F43","#54A0FF","#5F27CD","#10AC84","#F8F9FA"]},"compositional_deconstruction":{"background":"Light warm gray, clean and minimal","elements":[{"type":"text","bbox":[30,100,120,900],"text":"YOUR PERFECT\nMORNING ROUTINE","desc":"Bold rounded sans-serif header in dark purple, centered"},{"type":"obj","bbox":[140,50,350,450],"desc":"Circular icon of a glass of water with lemon slice, numbered 01 in orange beside it"},{"type":"text","bbox":[160,500,330,950],"text":"HYDRATE FIRST\nDrink 500ml water before coffee","desc":"Left-aligned text, title in bold dark, subtitle in medium gray"},{"type":"obj","bbox":[370,50,580,450],"desc":"Circular icon of a person stretching, numbered 02 in blue beside it"},{"type":"text","bbox":[390,500,560,950],"text":"MOVE YOUR BODY\n10 minutes of stretching or yoga","desc":"Left-aligned text, title in bold dark, subtitle in medium gray"},{"type":"obj","bbox":[600,50,810,450],"desc":"Circular icon of a journal and pen, numbered 03 in purple beside it"},{"type":"text","bbox":[620,500,790,950],"text":"WRITE 3 GOALS\nPrioritize before checking email","desc":"Left-aligned text, title in bold dark, subtitle in medium gray"},{"type":"obj","bbox":[830,50,950,450],"desc":"Circular icon of a healthy breakfast bowl, numbered 04 in green beside it"},{"type":"text","bbox":[840,500,940,950],"text":"EAT WELL\nProtein-rich breakfast, no sugar","desc":"Left-aligned text, title in bold dark, subtitle in medium gray"}]}}

結構化資訊圖表——每個文字區塊和圖示都有獨立 bounding box 防止重疊
角色設計
角色概念用詳細的 desc 欄位加受限的調色盤保持視覺一致性。
{"high_level_description":"A character design sheet for a cyberpunk courier, showing front and side view on a neutral background","style_description":{"aesthetics":"cyberpunk, detailed, concept art","lighting":"soft studio rim light with cyan accent from the left","medium":"illustration","art_style":"semi-realistic character concept art, clean linework with cel shading","color_palette":["#0D1B2A","#1B3A5C","#00E5FF","#FF6B35","#E0E0E0"]},"compositional_deconstruction":{"background":"Flat medium gray background with subtle grid, suitable for character turnaround sheet","elements":[{"type":"obj","bbox":[50,50,950,480],"desc":"Front view of a young woman in a fitted dark navy tactical jacket with glowing cyan piping along the seams, cargo pants with orange accent straps, short asymmetric black hair with one cyan-highlighted strand, wearing augmented reality goggles pushed up on forehead"},{"type":"obj","bbox":[50,520,950,950],"desc":"Three-quarter side view of the same character, showing a messenger bag with holographic patches on the back, utility belt visible, boots with magnetic soles, same outfit and color scheme as front view"}]}}

角色概念設定——鎖定調色盤保持多視角一致
Magic Prompt:懶人方案
不是所有人都想手寫 JSON。Ideogram 4.0 內建 Magic Prompt——一個 LLM 會在生圖前把純文字展開成完整 JSON。
輸入「a cozy coffee shop interior with morning light」,Magic Prompt 會生成包含風格描述、構圖元素、調色盤和 bounding box 的完整 JSON。做快速探索和構思時,它能省很多事。
適合用 Magic Prompt:
- 快速探索創意
- 不需要精確佈局
- 想讓模型自己做創意決定
適合手寫 JSON:
- 重文字排版(海報、社群圖、包裝)
- 需要精確 hex 色值保持品牌一致
- 有特定構圖的產品攝影
- 多個元素不能重疊的場景
常見錯誤和避坑指南
bounding box 順序搞反。 格式是 [y_min, x_min, y_max, x_max]——Y 在前面,不是 X。搞反了元素會出現在意想不到的位置。
同時用了 photo 和 art_style。 只能選一個。這兩個欄位在訓練資料中是互斥的——都寫上會降低輸出品質。
欄位順序打亂。 模型訓練時用的是固定欄位順序。正確順序:aesthetics → lighting → medium → art_style/photo → color_palette。元素裡:type → bbox → desc(文字元素:type → bbox → text → desc)。
用了縮寫 hex 色值。 #FFF 不行,必須寫完整六位:#FFFFFF。
文字 bounding box 重疊。 兩個文字元素佔同一區域會渲染不清楚。給每個文字區塊分配獨立的、不重疊的區域。
簡單場景過度指定。 詳細的 high_level_description 加風格控制往往就夠了。只在真正需要空間精度時才用 bounding box 和多元素構圖。
常見問題
必須用 JSON 才能得到好結果嗎?
不是。純文字也能用,尤其是開了 Magic Prompt 時。JSON 給你純文字做不到的佈局、顏色和文字位置控制——但如果不需要嚴格構圖,純文字完全夠用。
調色盤最多放多少顏色?
全域 style_description 調色盤最多 16 個 hex 色值,單個元素最多 5 個。這些顏色引導主色調但不保證精確匹配——更像是給模型的強建議。
Ideogram 4.0 支援什麼解析度?
256×256 到 2048×2048,必須是 16 的倍數。本地運行時用 2048×2048 配合 V4_QUALITY_48 取樣預設效果最好。
和 Ideogram 3.0 有什麼差別?
Ideogram 4.0 開源開權重(3.0 只有 API),用 JSON 結構化訓練替代自然語言,文字渲染能力大幅提升。bounding box 佈局和調色盤控制是 4.0 新增的。
能在本地跑嗎?
能。fp8 和 nf4 版本的權重在 HuggingFace 上。fp8 版本需要至少 24GB 顯存。ComfyUI 有社群節點支援,包括 KJ 的 prompt 建構節點讓 JSON 構造更方便。
線上哪裡可以用?
Editly 支援 Ideogram 4.0,純文字和 JSON 兩種輸入方式都可以。Ideogram 官方平台也透過 API 和網頁端提供。

