This is an old revision of the document!


Xiaofeng YU(余晓峰)


Postdoctoral Research Fellow (March 2011 ~ )

Department of Computer Science & Engineering
The Chinese University of Hong Kong

Rm 910, Ho Sin-Hang Engineering Building,
Department of Computer Science and Engineering,
CUHK, Shatin, N.T., Hong Kong
Email: xfyu@cse.cuhk.edu.hk email

Biography

Dr. Yu received his Ph.D degree (supervised by Prof. Wai Lam) from the Department of Systems Engineering & Engineering Management, The Chinese University of Hong Kong in 2010. He joined Web Intelligence & Social Computing group as a postdoctoral research fellow in March 2011, supervised by Prof. Irwin King and Prof. Michael R. Lyu.

Research Interests

  • Text mining, information extraction, and natural language processing
  • Information retrieval, social computing, Web search and Web data mining
  • Machine learning and artificial intelligence

Publications

  • Xiaofeng Yu and Wai Lam. Probabilistic joint models incorporating logic and learning via structured variational approximation for information extraction. Knowledge and Information Systems (KAIS), 2010. To appear.
  • Ritesh Agrawal, Xiaofeng Yu, Irwin King and Remi Zajac. Enrichment and reductionism: Two approaches for Web query classification. To appear in Proceedings of 18th International Conference on Neural Information Processing (ICONIP-11), Shanghai, China, 2011.
  • Xiaofeng Yu, Irwin King, and Michael R. Lyu. Towards a top-down and bottom-up bidirectional approach to joint information extraction. To appear in CIKM-11, Glasgow, Scotland, UK, 2011.
  • Xiaofeng Yu and Wai Lam. Jointly identifying entities and extracting relations in encyclopedia text via a graphical model approach. In Proceedings of COLING-10, pages 1399-1407, Beijing, China, 2010.
  • Xiaofeng Yu and Wai Lam. Accelerated training of maximum margin Markov models for sequence labeling: A case study of NP chunking. In Proceedings of COLING-10, pages 1408-1416, Beijing, China, 2010.
  • Xiaofeng Yu and Wai Lam. Bidirectional integration of pipeline models. In Proceedings of AAAI-10, pages 1045-1050, Atlanta, Georgia, USA, 2010.
  • Xiaofeng Yu, Wai Lam, and Bo Chen. An integrated discriminative probabilistic approach to information extraction. In Proceedings of CIKM-09, pages 325-334, Hong Kong, China, 2009.
  • Ki Chan, Wai Lam, and Xiaofeng Yu. Coreference resolution using expressive logic models. In Proceedings of CIKM-08, pages 1373-1374, Napa Valley, California, USA, 2008.
  • Xiaofeng Yu and Wai Lam. An integrated probabilistic and logic approach to encyclopedia relation extraction with multiple features. In Proceedings of COLING-08, pages 1065-1072, Manchester, United Kingdom, 2008.
  • Xiaofeng Yu and Wai Lam. Hidden dynamic probabilistic models for labeling sequence data. In Proceedings of AAAI-08, pages 739-745, Chicago, Illinois, 2008.
  • Xiaofeng Yu, Wai Lam, and Shing-Kit Chan. A framework based on graphical models with logic for Chinese named entity recognition. In Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP-08), pages 335-342, Hyderabad, India, 2008.
  • Shing-Kit Chan, Wai Lam, and Xiaofeng Yu. An online cascaded approach to biomedical named entity recognition. In Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP-08), pages 595-600, Hyderabad, India, 2008.
  • Xiaofeng Yu, Wai Lam, Shing-Kit Chan, Yiu Kei Wu, and Bo Chen. Chinese NER using CRFs and logic for the fourth SIGHAN bakeoff. In Proceedings of the 6th SIGHAN Workshop on Chinese Language Processing (SIGHAN-6), pages 102-105, Hyderabad, India, 2008.
  • Shing-Kit Chan, Wai Lam, and Xiaofeng Yu. A cascaded approach to biomedical named entity recognition using a unified model. In Proceedings of ICDM-07, pages 93-102, Omaha NE, USA, 2007.
  • Xiaofeng Yu. Chinese named entity recognition with cascaded hybrid model. In Proceedings of HLT/NAACL-07, pages 197-200, Rochester, New York, 2007.
  • Marine Carpuat, Yihai Shen, Xiaofeng Yu and Dekai Wu. Toward integrating word sense and entity disambiguation into statistical machine translation. In Proceedings of the 3rd International Workshop on Spoken Language Translation (IWSLT-06), pages 37-44, Kyoto, Japan, 2006.
  • Xiaofeng Yu, Marine Carpuat and Dekai Wu. Boosting for Chinese named entity recognition. In Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing (COLING/ACL-06 Workshop), Sydney, Australia, 2006.

Professional Activities

Professional Membership

  • Student Member, Association for the Advancement of Artificial Intelligence (AAAI)
  • Student Member, Association of Computing Machinery (ACM)
  • Student Member, Institute of Electrical and Electronics Engineers (IEEE)
  • Student Member, Association for Computational Linguistics (ACL)

Workshop Program Committee Member

  • The 20th ACM Conference on Information and Knowledge Management (CIKM 2011, KM Track)
  • The 20th ACM Conference on Information and Knowledge Management (CIKM 2011, Session Chair of IE)
  • The 5th SIGHAN Workshop on Chinese Language Processing (SIGHAN-5, 2006)
  • The 6th SIGHAN Workshop on Chinese Language Processing (SIGHAN-6, 2008)

Journal Reviewer

  • ACM Transactions on Knowledge Discovery from Data (TKDD)
  • ACM Transactions on Information Systems (TOIS)
  • IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
  • International Journal of Information Processing and Management (IJIPM)
  • IEEE Transactions on Knowledge and Data Engineering (TKDE)
  • Journal of Information Retrieval (IR)

Conference Reviewer

  • ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2008, 2009, 2010)
  • ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2007, 2008, 2009)
  • The International World Wide Web Conference (WWW 2008, 2009)
  • ACM Conference on Web Search and Data Mining (WSDM 2009)
  • The International Conference on Empirical Methods on Natural Language Processing (EMNLP 2008)
  • ACM International Conference on Information and Knowledge Management (CIKM 2007, 2008, 2009, 2010)
  • IEEE International Conference on Data Mining (ICDM 2007, 2009)
  • SIAM Conference on Data Mining (SDM 2008)
  • The Asia Information Retrieval Symposium (AIRS 2008, 2009)
  • The International Conference on the Computer Processing of Oriental Languages (ICCPOL 2007, 2008)
  • The International Conference on Machine Learning and Cybernetics (ICMLC 2008)

Research Projects

RGC Funded Projects

During my Ph.D. training, I mainly work on several funded research projects, including:

Project Code Project Title Amount (HK$'000) Funder
CUHK413510 Incorporating Non-local Interactions and Logical Inference into Sequence Classification Model for Practical Text Mining 1646.168 RGC
CUHK4128/07 A Framework for Cooperative Information Extraction and Relation Learning From Texts 391.776 RGC

Other Research Projects

Moreover, I have contributed and participated in several professional and well-known international evaluations and shared tasks, including:

TREC 2010

I have participated in the entity track of TREC 2010. The goal of this new track is to perform entity-related search on the World Wide Web (return a ranked list of entities of a specified type that engage in a given relationship with a given source entity). Since many user information needs would be better answered by specific entities instead of just any type of documents.

  • Project leader and chief system architect.
  • Designed and developed the homepage identification component based on machine learning techniques, and discussed several efficient features exploited.
  • Designed and researched on target entity finding component.
  • Designed and developed other major component, such as webpage filtering, source entity identification, entity homepage and document id mapping, etc.

SIGHAN-6

In SIGHAN-6, among all the 23 groups participating the official evaluation, our group obtained the best performance on the CityU corpus and the fourth place on the MSRA corpus. Moreover, we were the only group that obtained consistently over 90 F-measure on all the benchmark corpora in the NER open track.

  • Project leader and chief system architect.
  • Designed and implemented the NER system based on probabilistic graphical models with first-order logic.
  • Investigated the use of first-order logic and computational linguistics (e.g., domain knowledge) to improve the system performance.

SIGHAN-5

I have participated in the Chinese named entity recognition (NER) shared task of the third SIGHAN Chinese language processing bakeoff (SIGHAN-5), which provides large-scale benchmark data for evaluation. Our system employed boosting technique. Even though we did no other Chinese-specific tuning, and used only one-third of the MSRA and CityU corpora to train the system, reasonable results are obtained.

  • Project leader and major investigator.
  • Designed and implemented the Chinese NER system.
  • Researched on exploiting machine learning technique — boosting for Chinese NER problem, and compared with other algorithms such as support vector machines and maximum entropy models.

Senseval-3

I have participated in the Senseval-3 WSD evaluation, which was organized by ACL-SIGLEX and in conjunction with ACL 2004. Senseval-3 included 14 different tasks for core word sense disambiguation, as well as identification of semantic roles, multilingual annotations, logic forms, sub-categorization acquisition.

  • Designed and implemented a toolkit (GUI) for word sense selection.
  • Provided benchmark testing dataset for this evaluation.

NIST-MT 2004

I have participated in the 2004 NIST machine translation (MT) evaluation. As part of the DARPA TIDES program, the objective of the NIST MT evaluation series is to support research in, and help advance the state-of-the-art of machine translation technologies.

  • Pre-processed and word-aligned the NIST-MT 2004 corpus (bilingual and parallel corpus with a size of 1.9GB).
  • Researched on bilingual semantic lexicon construction, and compared six semantic similarity measures to enhance the lexicon quality.

National 863 and NSFC Projects

Translation optimization in CEMT2K translation system, word sense disambiguation (WSD) based on bilingual information, automatic building of bilingual semantic lexicons for translation selection, etc.

  • Optimized translation rules to improve system performance.
  • Researched on automatic acquisition of translation knowledge and translation rules.

Working Experiences

  • Sep 2005 - Jan 2007, Research Assistant, Dept. of Computer Science & Engineering, The Hong Kong University of Science & Technology
  • July 2007 - Oct 2010, Teaching Assistant, Dept. of Systems Engineering & Engineering Management, The Chinese University of Hong Kong

Teaching Assistants

Spring 2010 CSC 2100E/F (Data Structures) Fall 2009 SEG 3460 (Computer Processing System Concepts)
Spring 2009 SEG 3550 (Fundamentals in Information Systems) Fall 2008 SEG 3460 (Computer Processing System Concepts)
Spring 2008 SEG 3460 (Computer Processing System Concepts) Fall 2007 SEG 3460 (Computer Processing System Concepts)

Skills

Technical

  • Familiar with object-oriented development in C++, VC++ and Java, familiar with C and Perl
  • Development and analysis on UNIX/Linux, Windows and Solaris systems
  • Knowledge of relational databases such as SQL, knowledge of web-based languages such as HTML, Java Scripts, and XML

Soft

  • Motivated, passionate about technology
  • Proactive, self starter, autonomous
  • Good team spirit, good verbal and written communication skills
  • Able to working in an environment with ambiguous and changing requirements

Personal Hobby

Hiking, running, and ping-pong, etc

 
people/xiaofeng_yu.1317126584.txt.gz · Last modified: 2011/09/27 20:29 by xfyu     Back to top