빠른 시작
사전 준비
Agents SDK의 기본 빠른 시작 안내를 따라 가상 환경을 설정했는지 확인하세요. 그런 다음, SDK에서 제공하는 선택적 음성 관련 종속성을 설치하세요:
개념
핵심 개념은 VoicePipeline
이며, 3단계 프로세스입니다:
- 음성을 텍스트로 변환하기 위해 음성 인식 모델을 실행
- 보통 에이전트 기반 워크플로인 여러분의 코드를 실행하여 결과 생성
- 결과 텍스트를 다시 오디오로 변환하기 위해 음성 합성 모델을 실행
graph LR
%% Input
A["🎤 Audio Input"]
%% Voice Pipeline
subgraph Voice_Pipeline [Voice Pipeline]
direction TB
B["Transcribe (speech-to-text)"]
C["Your Code"]:::highlight
D["Text-to-speech"]
B --> C --> D
end
%% Output
E["🎧 Audio Output"]
%% Flow
A --> Voice_Pipeline
Voice_Pipeline --> E
%% Custom styling
classDef highlight fill:#ffcc66,stroke:#333,stroke-width:1px,font-weight:700;
에이전트
먼저 에이전트를 설정해 보겠습니다. 이 SDK로 에이전트를 만들어 보신 적이 있다면 익숙할 것입니다. 에이전트 몇 개와 핸드오프, 그리고 도구를 사용합니다.
import asyncio
import random
from agents import (
Agent,
function_tool,
)
from agents.extensions.handoff_prompt import prompt_with_handoff_instructions
@function_tool
def get_weather(city: str) -> str:
"""Get the weather for a given city."""
print(f"[debug] get_weather called with city: {city}")
choices = ["sunny", "cloudy", "rainy", "snowy"]
return f"The weather in {city} is {random.choice(choices)}."
spanish_agent = Agent(
name="Spanish",
handoff_description="A spanish speaking agent.",
instructions=prompt_with_handoff_instructions(
"You're speaking to a human, so be polite and concise. Speak in Spanish.",
),
model="gpt-4.1",
)
agent = Agent(
name="Assistant",
instructions=prompt_with_handoff_instructions(
"You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the spanish agent.",
),
model="gpt-4.1",
handoffs=[spanish_agent],
tools=[get_weather],
)
음성 파이프라인
워크플로로 SingleAgentVoiceWorkflow
를 사용하여 간단한 음성 파이프라인을 설정하겠습니다.
from agents.voice import SingleAgentVoiceWorkflow, VoicePipeline
pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))
파이프라인 실행
import numpy as np
import sounddevice as sd
from agents.voice import AudioInput
# For simplicity, we'll just create 3 seconds of silence
# In reality, you'd get microphone data
buffer = np.zeros(24000 * 3, dtype=np.int16)
audio_input = AudioInput(buffer=buffer)
result = await pipeline.run(audio_input)
# Create an audio player using `sounddevice`
player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16)
player.start()
# Play the audio stream as it comes in
async for event in result.stream():
if event.type == "voice_stream_event_audio":
player.write(event.data)
전체 통합
import asyncio
import random
import numpy as np
import sounddevice as sd
from agents import (
Agent,
function_tool,
set_tracing_disabled,
)
from agents.voice import (
AudioInput,
SingleAgentVoiceWorkflow,
VoicePipeline,
)
from agents.extensions.handoff_prompt import prompt_with_handoff_instructions
@function_tool
def get_weather(city: str) -> str:
"""Get the weather for a given city."""
print(f"[debug] get_weather called with city: {city}")
choices = ["sunny", "cloudy", "rainy", "snowy"]
return f"The weather in {city} is {random.choice(choices)}."
spanish_agent = Agent(
name="Spanish",
handoff_description="A spanish speaking agent.",
instructions=prompt_with_handoff_instructions(
"You're speaking to a human, so be polite and concise. Speak in Spanish.",
),
model="gpt-4.1",
)
agent = Agent(
name="Assistant",
instructions=prompt_with_handoff_instructions(
"You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the spanish agent.",
),
model="gpt-4.1",
handoffs=[spanish_agent],
tools=[get_weather],
)
async def main():
pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))
buffer = np.zeros(24000 * 3, dtype=np.int16)
audio_input = AudioInput(buffer=buffer)
result = await pipeline.run(audio_input)
# Create an audio player using `sounddevice`
player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16)
player.start()
# Play the audio stream as it comes in
async for event in result.stream():
if event.type == "voice_stream_event_audio":
player.write(event.data)
if __name__ == "__main__":
asyncio.run(main())
이 예제를 실행하면 에이전트가 여러분에게 말을 겁니다! 직접 에이전트와 대화할 수 있는 데모는 examples/voice/static를 참고하세요.