Overview
In recent years, there has been an increasing demand for better retrieval, processing, and analysis of textual information in modern society due to the availability of a huge and ever-growing amount of textual data from both inside organizations and the Internet. Well-known examples include web search engines (e.g., Google), document and content management systems, email filtering, social media sentiment analysis, automated question answering (e.g., IBM Watson on Jeopardy!), natural language interfaces in games and mobile devices, and big data text analytics for business/competitive intelligence. Natural language processing (NLP), also known as computational linguistics, aims to process and understand natural languages and text and is the driving force that makes these tasks and systems possible.
Computer Science 659: Statistical Language Processing for Text Analytics focuses on the principles and technologies of statistical machine-learning-based NLP and their application in text analytics, including retrieval, extraction, recognition, and analysis of information from large textual collections.
Note: The Python programming language and Python-based open-source machine-learning and NLP software tools are used in this course. However, you may select either Java or C++ as an alternative language and use its relevant open-source machine-learning and NLP tools for your assignments and research projects.
Outline
COMP 659 covers the core topics in statistical NLP and several applications of text analytics:
- Unit 1: Linguistics and Statistics Essentials
- Unit 2: Python for Text Processing
- Unit 3: Language Models for Information Retrieval
- Unit 4: Hidden Markov Models for POS Tagging
- Unit 5: Probabilistic Grammar and Parsing
- Unit 6: Statistical Machine Learning
- Unit 7: Text Classification and Clustering
- Unit 8: Semantic Structures and Parsing
- Unit 9: Named Entity and Relation Extraction
- Unit 10: Web Search and Question Answering
- Unit 11: Topic Modeling, Opinion Mining, and Sentiment Analysis
Learning outcomes
Upon successful completion of this course, you should be able to
- explain fundamental concepts, principles, models, and algorithms of natural language processing (NLP), including language models, POS tagging, syntactic and semantic parsing, named entity and relation extraction, question answering, opinion mining, and sentiment analysis.
- discuss state-of-the-art statistical and machine learning algorithms and techniques and their connection with the implementation of statistical NLP tasks and text analytics.
- apply machine learning and NLP algorithms to real natural language data for language and text processing.
- analyze large text collections by selecting and applying suitable statistical NLP approaches.
- evaluate and improve the performance of a selected statistical learning machine for a specific NLP task.
- design system structures and integrate open source components for statistical NLP and text analytics applications.
- review research articles from well-known NLP, machine learning, and AI journals and conference proceedings regarding NLP and text analytics.
- carry out a research project and write a research proposal, report, and paper.
Evaluation
To receive credit for COMP 659, you must achieve a course composite grade of at least B– (70 percent) and an average grade of at least 60 percent on the assignments.
The weighting of the composite grade is as follows:
Activity | Weight |
Assignment 1: Concepts, design, demonstration, reading | 15% |
Assignment 2: Demonstration, reading, analysis | 15% |
Assignment 3: Research project | 40% |
Assignment 4: Research paper | 20% |
Assignment 5: Discussion forum participation | 10% |
Total | 100% |
Materials
Digital course materials
Links to the following course materials will be made available in the course:
Bird, S., Klein, E., & Loper, E. (2019). Natural language processing with Python: Analyzing text with the natural language toolkit. https://www.nltk.org/book
Hastie, T., Tibshirani, R., & Friedman, J. (2009, February). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer. https://web.stanford.edu/~hastie/ElemStatLearn/
Manning, C. D., Raghavan, P., & Shutze, H. (2008). Introduction to information retrieval. Cambridge University Press. http://nlp.stanford.edu/IR-book/