Email: i[email protected]
Github: xzhao-github
Phone: 080-4562-4604
Linkedin: xin-zhao.linkedin
<aside> <img src="/icons/bookmark_gray.svg" alt="/icons/bookmark_gray.svg" width="40px" /> I am a dedicated researcher with a strong and extensive background in natural language processing. My consistent research on knowledge learning topics, including domain adaptation in language models and interpretability for understanding knowledge representation, combined with my experience as a software engineer in developing semantic-based search backends, has equipped me with deep expertise in research, problem-solving, and technical development, along with strong project management and teamwork skills.
</aside>
<aside> <img src="/icons/graduate_gray.svg" alt="/icons/graduate_gray.svg" width="40px" /> Yoshinaga Lab, Information Science and Technology, The University of Tokyo (April 2023 - Mar 2026)
Doctoral Courser, Conducted research in Natural Language Processing
</aside>
<aside> <img src="/icons/graduate_gray.svg" alt="/icons/graduate_gray.svg" width="40px" /> Matsumoto Lab, Computer science, Nara Institute of Science and Technology (April 2018 - Mar 2020)
Master Courser, Conducted research in Natural Language Processing and Knowledge Graph
</aside>
<aside> <img src="/icons/graduate_gray.svg" alt="/icons/graduate_gray.svg" width="40px" /> Foreign Languages and Literatures, Xi'an Jiaotong University (Sep 2013 - Jul 2017)
Bachelor Courser, Specialized in Japanese
</aside>
Research Assistant at LLMC, National Institute of Informatics [Japan] (July 2024 - Present)
- Developing medical domain Japanese language models
- Participated in research projects about achieving cross-lingual transfer for Japanese
Software Engineer at Big Data Department, Rakuten Group, Inc. (April 2020 - Mar 2023)
- Optimized the Japanese and Chinese tokenizer used in the search engine to improve accuracy and reduce costs
- Enhanced Rakuten's search engines based by optimizing processing logic to reduce response latency
- Enhanced semantic research, such as developing related-keywords module
- Prepared project documentation and developed test
Intern, Data Scientist at iQIYI. (Jan 2019 - Mar 2019)
- Assisted in creating knowledge graphs about artwork from the China Baidu Encyclopedia.
Developing domain-adaption Japanese language model through cross-lingual transfer (Aug 2024 - Present)
Adapting pre-trained LLMs to specific domains requires training with large amounts of domain-specific text, which is challenging for languages with limited domain data. We aim to learn domain knowledge from English and transfer them to the target language by exploring the optimal recipe to achieve domain knowledge cross-lingual transfer.
Interpreting knowledge representation and achieving output control at neuron-level (July 2024 - Present)
To deepen our understanding of LLMs' mechanism for storing knowledge, we analyzed neuron behavior in mapping knowledge inquiries to outputs. We found neurons exhibit linear control over LLM outputs and are developing neuron-level output control to enhance knowledge recall.
Understanding knowledge memorization and recall process in language models (Aug 2023 - Jun 2024)
We analyzed how LLMs learn and represent knowledge to improve pre-training and retrieval. Our analysis of multilingual fact representations revealed three distinct patterns. We also conducted comprehensive evaluations of factual knowledge across various LLMs to identify key factors that influence fact learning during pre-training.
Cross-lingual entity alignment via optimal transport (Feb 2019 - Dec 2019)
Cross-lingual entity alignment helps extend knowledge graphs in low-resource languages. We developed an unsupervised cross-lingual entity alignment method using the optimal transport algorithm.