Computer Science (COMP) 659

Statistical Language Processing for Text Analytics (Revision 2)

Status:

Open

Delivery mode:

Grouped study. Delivered via Brightspace.

Credits:

Area of study:

Information Systems

Prerequisites:

COMP 501 (or an equivalent high-level programming language course) and the essentials of undergraduate-level probability and/or statistics, or course instructor approval.

Precluded:

None

Faculty:

Faculty of Science and Technology

Notes:

This is a graduate-level course, and you must apply and be approved to one of the graduate programs or as a non-program School of Computing and Information Systems graduate student to take this course. Minimum admission requirements must be met. Undergraduate students who do not meet admission requirements will not normally be permitted to take this course.

Course extensions will not be permitted for COMP 659 due to the nature of the course activities.

Instructor:

Dr. Dunwei (Grant) Wen

Overview

In recent years, there has been an increasing demand for better retrieval, processing, and analysis of textual information in modern society due to the availability of a huge and ever-growing amount of textual data from both inside organizations and the Internet. Well-known examples include web search engines (e.g., Google), document and content management systems, email filtering, social media sentiment analysis, automated question answering (e.g., IBM Watson on Jeopardy!), natural language interfaces in games and mobile devices, and big data text analytics for business/competitive intelligence. Natural language processing (NLP), also known as computational linguistics, aims to process and understand natural languages and text and is the driving force that makes these tasks and systems possible.

Computer Science 659: Statistical Language Processing for Text Analytics focuses on the principles and technologies of statistical machine-learning-based NLP and their application in text analytics, including retrieval, extraction, recognition, and analysis of information from large textual collections.

Note: The Python programming language and Python-based open-source machine-learning and NLP software tools are used in this course. However, you may select either Java or C++ as an alternative language and use its relevant open-source machine-learning and NLP tools for your assignments and research projects.

Outline

COMP 659 covers the core topics in statistical NLP and several applications of text analytics:

Unit 1: Linguistics and Statistics Essentials
Unit 2: Python for Text Processing
Unit 3: Language Models for Information Retrieval
Unit 4: Hidden Markov Models for POS Tagging
Unit 5: Probabilistic Grammar and Parsing
Unit 6: Statistical Machine Learning
Unit 7: Text Classification and Clustering
Unit 8: Semantic Structures and Parsing
Unit 9: Named Entity and Relation Extraction
Unit 10: Web Search and Question Answering
Unit 11: Topic Modeling, Opinion Mining, and Sentiment Analysis

Learning outcomes

Upon successful completion of this course, you should be able to

explain fundamental concepts, principles, models, and algorithms of natural language processing (NLP), including language models, POS tagging, syntactic and semantic parsing, named entity and relation extraction, question answering, opinion mining, and sentiment analysis.
discuss state-of-the-art statistical and machine learning algorithms and techniques and their connection with the implementation of statistical NLP tasks and text analytics.
apply machine learning and NLP algorithms to real natural language data for language and text processing.
analyze large text collections by selecting and applying suitable statistical NLP approaches.
evaluate and improve the performance of a selected statistical learning machine for a specific NLP task.
design system structures and integrate open source components for statistical NLP and text analytics applications.
review research articles from well-known NLP, machine learning, and AI journals and conference proceedings regarding NLP and text analytics.
carry out a research project and write a research proposal, report, and paper.

Evaluation

To receive credit for COMP 659, you must achieve a course composite grade of at least B– (70 percent) and an average grade of at least 60 percent on the assignments.

The weighting of the composite grade is as follows:

Activity	Weight
Assignment 1: Concepts, design, demonstration, reading	15%
Assignment 2: Demonstration, reading, analysis	15%
Assignment 3: Research project	40%
Assignment 4: Research paper	20%
Assignment 5: Discussion forum participation	10%
Total	100%

Materials

Digital course materials

Links to the following course materials will be made available in the course:

Bird, S., Klein, E., & Loper, E. (2019). Natural language processing with Python: Analyzing text with the natural language toolkit. https://www.nltk.org/book

Hastie, T., Tibshirani, R., & Friedman, J. (2009, February). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer. https://web.stanford.edu/~hastie/ElemStatLearn/

Manning, C. D., Raghavan, P., & Shutze, H. (2008). Introduction to information retrieval. Cambridge University Press. http://nlp.stanford.edu/IR-book/

Athabasca University reserves the right to amend course outlines occasionally and without notice. Courses offered by other delivery methods may vary from their individualized study counterparts.

Opened in Revision 2, January 27, 2025

Updated January 27, 2025

View previous revision