期刊論文
深度詞庫:邁向知識導向的人工智慧基礎

DOI:10.6129/CJP.201909 _61(3).0004
中華心理學刊 民 108,61 卷,3 期,231-246
Chinese Journal of Psychology 2019, Vol.61, No.3, 209-224


謝舒凱國立台灣大學語言學研究所)曾昱翔(國立台灣大學語言學研究所 )

 

摘要

晚近的深度學習神經網路在大數據與高效計算的時代背景之下,在語音處理與其他辨識任務上取得重大的成就。尤其詞嵌入(word embeddings)的分布向量語意(distributional vector semantics)表徵提出後,計算機逐步掌握人類語言中的詞彙語義關係。然而語言與概念知識中存在的豐富階層關係,仍難以被目前的神經網路架構表徵與概化。在計算語言學領域,學者們從不同的詞彙理論假說,發展出各式詞彙資源(lexical resources),試圖彌補計算機從「共聚性」(syntagmatic)資料難以學習到的「類聚性」(paradigmatic)知識,以讓計算機逐漸靠近人類可以利用少量數據,在未知情況下進行推理,以及瞭解甚至同理人類情感的能力。這些人類能力的共通之處在於涉及個人、社會與文化脈絡的互動,具有高脈絡變異性的特點,難以用巨量的薄數據的方式讓電腦學習。此研究採取計算功能語言學的觀點,認為詞庫是外顯的人類語言知識倉儲。透過人為標記與自動的抽取紀錄,是通用人工智慧自主學習的重要基礎之一。本研究並進一步認為,詞庫中的語言知識除了「形式」與「意義」的配對關係以外,更應回應在中文語言裡,表達形式的流動性以及表達形式與意義的連動性。本研究的目的在整合並發展包含語言、心理、華語教學等各層次變項的「深度詞庫」,以及讓使用者得以自由決定中文語式的標記工具,並討論此語言資源未來的可能應用。

 

關鍵詞:人工智慧、計算詞庫、對話系統、語意表徵

 


DeepLEX: Toward a Knowledge-yielding Approach and Resource for AI

Shu-Kai Hsieh (Graduate Institute of Linguistics, National Taiwan University), Yu-Hsiang Tseng (Graduate Institute of Linguistics, National Taiwan University)

 

Abstract  

Deep learning and neural network has gained substantial progress in recent years. After the introduction of word embeddings, a form of distributional vector semantics, computers could better simulate the lexical semantic relationships between words. However, the hierarchical nature of human language and concepts are still difficult to modeled by current approach. In computational linguistics, researchers developed lexical resources from different theoretical perspectives. These language resources attempt to bridge the gap between syntagmatic relationships, which computers can readily modeled from data, and paradigmatic knowledge, that are not readily grasped by computers. These knowledge are essential for the capability to reason in an unfamiliar context with only few data, and are also vital to develop empathy of human emotions. The commonality of these capabilities involves the high context variance, in which individual, social and cultural context intertwined, render a great challenge for computers to learn in a data-hungry way. Current study considers, as one would argue in computational functional linguistics, lexicon as an explicit knowledge base of human language. It is human annotation aided by automatic extraction the essential building block of strong artificial intelligence. Moreover, the knowledge stored in lexicon not only contains the pairing between forms and meanings, it should also address the fluidity of formulae and the dynamics between form-meaning pairings. The goal of current study is thus to integrate and develop a novel lexicon model called DeepLex that includes multilevel lexical properties, such as linguistic, psychological and pedagogical. A web-based tool is also developed to help users to freely determine and annotate formulae in Chinese. Further applications of DeepLex is also discussed.

 

Keywords: AI, computational lexicon, dialogue system, semantic representation

 

登入
會員登入
更新驗證碼