http://www.e-ticket.cn/bbs/2778146.html
------無奈的起始分割線(受論壇帖子字?jǐn)?shù)限制,前文見上一帖子)-----
主要線程代碼
模型推理線程
def llama_thread():
global bool_Chinese_tts
model = llama_cpp.Llama(
model_path=f"{current_dir}/models/llama/qwen1_5-0_5b-chat-q4_0.gguf",
# n_ctx = 4096,
verbose=False,
)
ch_punctuations_re = "[,。?;]"
llama_load_done.set()
messages_history = []
max_msg_history = 2
print("Load llama model done")
while True:
trig_llama_event.wait()
trig_llama_event.clear()
ask_text = ask_text_q.get()
print(ask_text)
messages_history.append({"role": "user", "content": ask_text})
if len(messages_history) > max_msg_history:
messages_history = messages_history[-max_msg_history:]
ans_text = model.create_chat_completion(
messages=messages_history,
logprobs=False,
# stream=True,
repeat_penalty = 1.2,
max_tokens=100,
)
ans_text = ans_text["choices"][0]["message"]["content"]
messages_history.append({"role": "assistant", "content": ans_text})
if len(messages_history) > max_msg_history:
messages_history = messages_history[-max_msg_history:]
print(ans_text)
ans_text_tts = ans_text.replace(",", "。")
bool_Chinese_tts = bool(re.search(r"[\u4e00-\u9fff]", ans_text_tts)) # Chinese?
ans_text_q.put(ans_text_tts)
model_doing_event.clear()
模型推理部分,做了一些特殊處理:
-模型傳入的輸入信息,是包含前一次模型的輸入及輸出的,以便讓模型每次推理具有一定的上下文信息。但增加上下文長度,會(huì)犧牲模型的推理速度,所以應(yīng)該根據(jù)實(shí)際算力情況,合理規(guī)劃上下文長度。
-模型做了一些參數(shù)設(shè)置,例如限制了最大輸出Tokens數(shù),以及重復(fù)性懲罰repeat_penalty等,避免模型一次輸出太多信息,甚至重復(fù)輸出一些無效信息。
-模式輸出文本做了處理,將中文逗號(hào)全部替換成中文句號(hào),這主要是簡單解決Piper TTS對(duì)中文逗號(hào)幾乎沒有語音停頓的局限性。
-判斷模型輸出的語言類型,以便讓Piper TTS對(duì)應(yīng)加載不同的語言模型。
注:
-如果需要更換模型,只需要簡單修改代碼中model_path指定到對(duì)應(yīng)的gguf文件即可
-由于增加了1次對(duì)話上下文信息,所以一定程度犧牲了模型推理速度
-考慮后續(xù)TTS輸出的連貫性,模型未采用流式輸出,所以感官上從模型輸入到完整輸出,推理會(huì)有較長的等待時(shí)間
文本轉(zhuǎn)語音線程
def tts_thread():
global bool_Chinese_tts
piper_cmd_zh = f"{current_dir}/piper/piper --model {current_dir}/models/piper/zh_CN-huayan-medium.onnx --output-raw | aplay -r 22050 -f S16_LE -t raw -"
piper_cmd_en = f"{current_dir}/piper/piper --model {current_dir}/models/piper/en_GB-jenny_dioco-medium.onnx --output-raw | aplay -r 22050 -f S16_LE -t raw -"
while True:
tts_text = ans_text_q.get()
if bool_Chinese_tts:
command = f'echo "{tts_text}" | {piper_cmd_zh}'
else:
command = f"echo {shlex.quote(tts_text)} | {piper_cmd_en}"
process = subprocess.Popen(
command, shell=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
)
while True:
if stop_tts_event.is_set():
terminate_process(process.pid)
break
if process.poll() is not None:
break
time.sleep(0.01)
process.wait()
Piper TTS通過命令行進(jìn)行調(diào)用,所以對(duì)于英文的文本要使用shlex.quote()函數(shù),對(duì)特殊字符進(jìn)行預(yù)處理,避免文本里的字符影響命令行的結(jié)構(gòu)。同時(shí),在播放過程中,如果錄音KEY被按下,則TTS會(huì)主動(dòng)中斷退出。
顯示線程
def oled_thread(oled_device, dir):
with Image.open(f"{current_dir}/img/{dir}_logo.bmp") as img:
img_resized = img.convert("1").resize((128, 64))
img_resized = ImageOps.invert(img_resized)
oled_device.display(img_resized)
llama_load_done.wait()
senvc_load_done.wait()
frames_eye = []
durations_eye = []
with Image.open(f"{current_dir}/img/{dir}_eye.gif") as img:
for frame in ImageSequence.Iterator(img):
frames_eye.append(frame.convert("1").resize((128, 64)))
durations_eye.append(frame.info.get("duration", 100) / 1000.0)
frames_rcd = []
durations_rcd = []
with Image.open(f"{current_dir}/img/record.gif") as img:
for frame in ImageSequence.Iterator(img):
frames_rcd.append(frame.convert("1").resize((128, 64)))
durations_rcd.append(frame.info.get("duration", 100) / 1000.0)
while True:
if show_record_event.is_set():
for frame, duration in zip(frames_rcd, durations_rcd):
oled_device.display(frame)
time.sleep(duration)
if not show_record_event.is_set():
break
else:
for frame, duration in zip(frames_eye, durations_eye):
if model_doing_event.is_set() and duration > 1:
continue
if duration > 1:
duration = duration * 2
oled_device.display(frame)
show_record_event.wait(timeout=duration)
if show_record_event.is_set():
break
else:
if dir == "left":
oled_events["left"].set()
oled_events["right"].wait()
oled_events["right"].clear()
else:
oled_events["right"].set()
oled_events["left"].wait()
oled_events["left"].clear()
顯示線程主要用于協(xié)同顯示當(dāng)前的程序運(yùn)行狀態(tài),例如錄音時(shí)顯示音頻波形、模型推理時(shí)顯示閉眼思考狀態(tài),推理完成后睜開眼睛并眨眼等。
效果演示
圖片展示:
視頻演示:
代碼開源地址:
Gitee源代碼:https://gitee.com/ETRD/pi-ai-robot
附加模型文件:https://pan.huang1111.cn/s/MNgwOhx
小結(jié)
本文介紹了在嵌入式終端上,基于本地大模型實(shí)現(xiàn)的多語言離線語音聊天機(jī)器人。本項(xiàng)目在樹莓派5上具體實(shí)現(xiàn),代碼完全開源,理論上可以運(yùn)行于任何具有相當(dāng)算力和資源的嵌入式終端。