|
THE COLLEGE OF NEW |
|
Computer Science / Interactive Multi-Media |
|
CSC320/IMM320: Information Retrieval |
|
Fall 2007 |
The course is aimed at advanced undergraduate students
in Computer Science, Interactive Multimedia, Information Science, Business. The course is intended to prepare students to
design, use and evaluate information retrieval systems. The course also aims to
give students a broad understanding of inner workings of automated information
retrieval systems, and how such systems interact with users and affect their
productivity.
This course will discuss theory and practice of searching
and retrieval of text and bibliographic information. Topics covered include
automated indexing, statistical and linguistic models, text classification,
Boolean and probabilistic approaches to indexing, query formulation and output
ranking, information routing and filtering, topic detection and tracking, as
well as measures of retrieval effectiveness, including relevance, utility, miss/false-alarm. Techniques for enhancing retrieval
effectiveness including relevance feedback, query reformulation, thesauri,
concept extraction, and automated summarization. Experimental retrieval
approaches from Text Retrieval Conferences (TREC); modern Internet search
engines (Google, Yahoo, etc).
The prerequisite for taking this class for CS students
is the CSC 230 (CS II) "Computer Science II :
Data Structures" course. For IMM students, the prerequisite is the IMM
core. For students outside of the CS and IMM majors, a skill equivalent of CS
or IMM prerequisites is expected and will be assessed by the instructor before
the student is allowed to join the class. Student background is expected to
include familiarity with data structures and
algorithms, elementary algebra, basic statistics and probability, elements of
logic and set theory, having used a catalogue in a library, and having used an
Internet search system.
Dr. Miroslav Martinovic
Brief Biography
Ph.D. 1993
CS Faculty 2000-present, TCNJ
CS/Math Faculty, 1989-2000,
CS/Math Faculty, 1983-1988,
Principal Scientist, 1989-present.
Research Interests
Question-Answering
Systems
Natural
Language Processing
Information
Retrieval
Theory
of Gaming
Computer Science Education
Logic
Programming
Expert
Systems
Sponsors
NSF
DARPA
NIST
Microsoft
World University Service
Studenica Foundation
E-mail Address :
|
mmmartin@tcnj.edu (click to e-mail) |
Telephone :
|
(609) 771-2789. |
Office :
|
Holman Hall 207 / 230. |
Class Time:
|
Lecture Notes (e-mail instructor for a password) |
Monday, Thursday 12:30-1:50 |
at Holman Hall 253 |
|
Instructor supervised assigned work from the Paper / Topic List below |
360 minutes at students own schedule. |
at Holman Hall 372 |
Textbooks:
|
Course Main Text |
|
|
|
Modern Information Retrieval |
R. Baeza-Yates, B. Ribeiro-Neto Published by Addison Wesley, 2000. |
ISBN 0-201-39829-X |
|
Additional Texts |
|
|
|
1. |
Karen Sparck-Jones and Peter Willett (editors) Morgan-Kaufmann Publishers, 1997. |
|
|
2. Natural Language Information Retrieval |
Tomek Strzalkowski (editor) Kluwer Academic Publishers, 1999. |
|
|
3. Information retrieval: data structures & algorithms |
William B. Frakes and Ricardo Baeza-Yates |
|
|
4. Mathematical Foundations of Information Retrieval |
by: S. Dominich Published by Kluwer Publishing, 1999. |
ISBN 0-7923-6861-4 |
Office Hours :
|
Monday |
Tuesday |
Thursday |
Friday |
|
|
|
|
|
|
9:15-9:45 |
9:15-12:45 |
9:15-9:45 |
3:45-5:45 (by appointment) |
|
11:30-1:30 |
|
11:30-1:30 |
|
Grading Policy:
|
(i) The
course topics will be examined through readings, discussion, hands-on
experience using various information retrieval systems, and through
participation in evaluation of different retrieval algorithms on various test
collections.
(ii) There will be periodic assignments and a final
paper/project.
(iii) In-class presentations (readings and/or experiments)
will require a preparation that includes finding materials outside of base
reading and during the assigned work periods.
Final paper will be a technical paper on an IR issue.
Topics for a programming project include :
- topic detection through concept extraction / topic tracking
- pivoted normalization weighting in SMART
- query expansion tool
- self-learning concept spotter
- LMI using TTP
- automatic summarizing
- sub-categorization of retrieved set
- question-answering.
|
CSC320 / IMM320 |
|
|
Week 1 |
Introduction to Information Retrieval |
|
|
|
|
|
|
A. What is Information
Retrieval? |
|
|
Week 2 |
Conceptual Models Discussion. |
Student presentations; IR systems |
|
|
|
|
|
A. Boolean and Extended Boolean
Models; Glimpse |
|
|
Week 3 |
Evaluation |
Class exercise: evaluating web search engines by pooling method |
|
|
|
|
|
A. Assumptions in IR performance
evaluation. |
|
|
Week 4 & 5 |
Automated Indexing |
|
|
|
A. Properties of language
collections. |
|
|
Week 6 |
Query Languages and Query operations |
Lecture |
|
|
|
|
|
A. Keyword queries |
|
|
Week 7 |
NLP Tools : Parsers (A Parser for English) |
Paper presentation
and critique with a demonstration session Papers : Papers/APParser/manual.ps, Papers/APParser/APParser.htm
|
|
|
Week 8 |
NLP Tools : Electronic Lexicons (WordNet)
|
Paper presentation and critique with a demonstration session Documentation : http://www.cogsci.princeton.edu/~wn/doc.shtml |
|
|
Week 9 |
Automatic Classification
|
Lecture and presentations |
|
|
A. Manual
classification. |
|
|
Week 10 & 11 |
Question Answering |
Student presentation, instructor presentation; Discussion Exercise: TREC Q&A task |
|
|
|
|
|
A. Classical Q&A
problem (student presentation) Case studies: AnswerBus, QASTIIR |
|
|
Week 12 & 13 |
Web Search |
|
|
|
A. About the Web and
hypertext |
|
|
|
|
|
|
|
Project Presentations and Demos |
Week 14 |
1. Gerard Salton. Automatic text
processing: the transformation, analysis, and retrieval of information by
computer.
2. C. J. van Rijsbergen. Information
retrieval.
3. Text Retrieval Conference (TREC) proceedings (copies from instructor)
4. ACM SIGIR Conference Proceedings (copies from instructor)
5. Technical journals:
a. Information Processing & Management, Pergamon Press
b. Information Retrieval, Kluwer Academic Publishers
c. Computational Linguistics, MIT Press
d. Journal of the ASIS
|
Topic Paper and Demonstration Materials |
Presenter |
Presentation |
|
1. NLP Tools - SMART IR System : Paper Presentation and a Demonstration Session Paper : Papers/SMART/SmartCourse.html |
|
10/15 |
|
2. Web Information Retrieval : Google's Success Paper : Papers/Google/Google.pdf |
|
10/18 |
|
3. An Extension of VS Model : Latent Semantic Indexing Paper : Papers/LatSemInd/LSI.ppt |
|
10/22 |
|
4. Essential Properties of Information Retrieval : NLP for IR Paper : Papers/NLPforIR/NLP-IR.pdf |
|
10/29 |
|
5. Text Annotation Techniques Paper : Papers/TAT/ |
|
11/1 |
|
6. Image Retrieval : Paper Presentation Paper Resources : Papers/ImageIR/ |
|
11/5 |
|
7. Thesauri in Information Retrieval Papers : Papers/IRThesauri/AutoDerofThes.ppt |
|
11/8 |
|
8. Genomics Track in Information Retrieval Paper Resource : http://ir.ohsu.edu/genomics/ |
|
11/12 |
|
9. Weighing Flavors in IR Paper : Papers/Weights/tfidfFlavors.ppt |
|
11/15 |
|
10. Question / Answer Taxonomies and Categorizations in QA Systems Paper Resource : Papers/QuestionAnswerCategorization/ |
|
11/19 |
|
11. MURAX and ASKJEEVES Paper Resource : Papers/MurAskJ/ |
|
11/22 |
|
12. Topic Detection and Tracking Paper Resources: Papers/TDT/ |
|
11/29 |
|
13. CYC – A Large Common Sense Knowledge Base
in Information Retrieval Paper Resources : ~mmmartin/ResearchCyc |
|
12/3 |
Paper critique and presentation guidelines
Paper
Critique Guidelines
Each critique should be no more than one page long. Less than a page is OK. The purpose of a critique is not to summarize the paper; rather you should choose one or two points about the work that you found interesting.
Examples of questions that you might address are:
Your critique should be typed (single space) and should list the title of the paper and its authors at the top, along with your name.
Avoid unsupported value judgments, like ``I liked...'' or ``I disagreed with...'' If you make judgments of this sort, explain why you liked or disagreed with the point you describe.
Be sure to distinguish comments about the writing of the paper from comment about the technical content of the work.
Paper
Presentation Guidelines
Length : class period (60-80 minutes)
Medium : PowerPoint, HTML slides, PDF slides or alike.
Paper
Critique Presentation Guidelines
Length : class time (talk of up to 40 minutes to be followed by an up to 40 minutes discussion mediated by the presenter)
Medium : PowerPoint, HTML slides, PDF slides or alike.
Note about how the preparedness for other students presentations affects the grade
(i) All listed papers must be read by every student in class.
(ii) The discussion following the paper presentation and paper critique presentation demonstrates that the student has read the paper.
(iii) Student's involvement and competence in
the discussion from (ii) will directly affect the "Attendance, Class Participation
and Effort"'s 20% of
student's total grade for the entire course.
2007-08-22
These projects
will investigate an open issue in IR and prepare a research paper outlining
existing approaches, their strengths and weaknesses, and offer a new approach
to be investigated.
1. Natural language processing approaches in IR
2. Question answering
3. Evaluation of IR performance
4. Automatic summarization methods
5. Automated classification
6. Machine learning in IR
7. Cross-lingual IR
8. Cross-lingual summarization
9. Multi-media retrieval (speech, video, web pages)
10. Information
fusion.