|
|
|
|
|
|
|
|
About the course :
The goal of the information extraction (IE) is the design of systems that are capable of analyzing only the text passages which contain relevant (the system given) information. In addition, such systems do not try a comprehensive analysis of all text documents but purpously overlook the irrelevant information.
Textual Question/Answering (QA) systems represent the most current trend in the information extraction from free on-line sources of text. A goal is the construction of systems, which can identify the answers to a natural-language question from a large quantity of on-line text documents. In contrast to information retrieval systems which supply a quantity of documents as a result of a simple, word-based search, QA systems are capable of identifying the exact passages in the text of relevant documents which represent the concrete answer. In addition, there are no restrictions concerning the subject of the natural-language questions.
Topics to be addressed include, but are not limited to:
- Question answering language resources (LR) and scientific
algorithm developments
- Guidelines, standards, specifications, models and best
practices for question answering LR
- Methods, tools, and procedures for the acquisition,
creation, management, access, distribution, and use of
question answering LR
- LR and evaluation and benchmarking of question
answering systems and algorithms for tasks including:
- Advanced question analysis
- Answer discovery and integration
- Answer explanation and presentation generation
- Interactive question answering
Possible joint products to be created include:
- List of existing resources and ones under development
(with planned release dates)
- Updates to ARDA Q&A Roadmap (www-nlpir.nist.gov/projects/duc/papers/qa.Roadmap-paper_v2.doc)
- List of Evaluation methods and benchmarks of question
answering systems
- List of unresolved research problems and/or areas
in question answering
- Shared knowledge of research groups and efforts
Prerequisites : Advanced Algorithms + Instructor's Permission
Class Time:
| Section 03 : | Monday, Wednesday 9:30-10:50 | at Holman Hall 128. |
Textbooks:
| I. | ||
|
"Modern Information Retrieval"
|
by: R. Baeza-Yates, B. Ribeiro-Neto
|
|
|
Published by Addison Wesley
|
ISBN 0-201-39829-X
|
|
| II. | ||
|
"Mathematical Foundations of Information Retrieval
"
|
by: S. Dominich
|
|
|
Published by Kluwer Publishing
|
ISBN 0-7923-6861-4
|
Instructor:
| Dr. Miroslav Martinovic. |
E-mail Address :
| mmmartin@tcnj.edu |
Telephone :
| (609) 771-2789. |
Office :
| Holman Hall 243. |
Monday :
|
Wednesday :
|
Thursday :
|
Grading Policy:
|
|
Tentative Schedule |
|
Week 1 and 2
|
|
| Introduction to Corpus-Based Question Answering |
| What is corpus-based Q&A ?
Evaluations of Q&A Systems : TREC Current Approaches to Q&A NLP & IR for Q&A Systems Semantics in Q&A Systems Slides (transparencies used in class) Courtesy of : C. Monz and M. de Rijke |
|
Week 2 and 3
|
|
| What's in Store for Question Answering ? Ask Jeeves |
| "Take-home" messages when considering Q&A task
Some anectdotes and a few statistics Prognostications Slides (transparencies used in class) Courtesy of : J.B. Lowe |
|
Week 4
|
|
| Web Information Retrieval : Google's Success | Paper presentation and critique. |
|
Week 5
|
|
| Essential Properties of Information Retrieval : NLP for IR | Paper presentation and critique. |
|
Week 6
|
|
| NLP Tools : Generic Retrieval Systems (SMART System) | Paper presentation and critique with a demonstration session. |
| Week 7 | |
| NLP Tools : Part-of-Speech Tagger (Eric Brill's Part-of-Speech Tagger) | Paper presentation and critique, tagger installation and demonstration.
Paper : Papers/POSTagger/aaai94-tagger.ps
Resource directory : ~mmmartin/Information Retrieval/EricBrill'sTagger/ |
| Week 8 | |
| NLP Tools : Parsers (Apple Pie Parser for English) | Paper presentation and critique with a demonstration session.
Papers : Papers/APParser/manual.ps,
Papers/APParser/APParser.htm
Resource directory (springfield) : /projects/mmmartin/Information Retrieval/NYU Parser/ |
| Week 9 | |
| NLP Tools : Electronic Lexicons (WordNet) | Paper presentation and critique with a demonstration session.
Documentation : http://www.cogsci.princeton.edu/~wn/doc.shtml
Resource directory : ~mmmartin/www/CMSC485/Papers/WordNet/ |
|
Week 10 and 11
|
|
| Advanced Question Answering : Pleanty of Challenges to Go Around |
| AQAINT Program
Introducing ARDA Advanced Question Ansering Multiple Approaches AQAINT Program Challenges from AQAINT Perspective Some Final Thoughts Slides (transparencies used in class) Courtesy of : ARDA and J.D. Prange |
|
Week 12 and 13
|
|
| Issues, Tasks and Program Structures to Roadmap Research in Q&A |
| Issues in Q&A Research
Question Classes: Need for question taxonomies Question Processing: Understanding, Ambiguities, Implicatures and Reformulations Context and Q&A Data Sources for Q&A Answer Extraction: Justification and Evaluation of Answer Correctness Answer Formulation Real Time Question Answering Interactive Q&A Advanced Reasoning for Q&A User Profiling for Q&A Collaborative Q&A Milestones in the Program Evaluation Framework Slides (transparencies used in class) Courtesy of : J.Burger, et. al. |
|
| Week 13 | |
| Named Entity Recognition | Paper presentation and critique.
Paper Resource Directory : Papers/NER/
|
| Week 14 | |
| Anaphora Resolution | Paper presentation and critique.
Paper Resource Directory : Papers/Anaphora/
|
| Project Presentations and Demos |
Week 14
|