-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
基于特定领域的word-bert预训练模型 #59
Comments
good question, 我也正打算發問相同問題.,. 因词汇乃根據不同應用景場而改變的. |
模型默认使用的jieba分词,文本输入模型前需要去除无关字符吗? |
首先建立新的词典 对于基于词的模型,基于不同语料的预训练模型应该有自己的词典。目前我已经把具体的步骤添加到readme中。具体的信息可在readme中的Word-based pre-training model部分查看 |
您好,我是从PubMedBERT的huggingface那边转换的模型,他们有一个对应的vocab文件,和咱们这边用的不一样,字典大小不同,请问这种情况下应该如何修改呢?我看config文件里面字段名称也不相同。 |
如果想要获得特定领域的预训练模型,如果很多词汇不在提供的vocab.txt中,是不是要自己手动构建vocab.txt然后训练 预训练模型,后续再进行fine-tune
The text was updated successfully, but these errors were encountered: