ÌÇÐÄ̽»¨
CS3TM: Text Mining and Natural Language Processing
Module code: CS3TM
Module provider: Computer Science; School of Mathematical, Physical and Computational Sciences
Credits: 20
Level: 6
When you’ll be taught: Semester 2
Module convenor: Professor Xia Hong , email: x.hong@reading.ac.uk
Pre-requisite module(s): BEFORE TAKING THIS MODULE YOU MUST TAKE CS2PP OR TAKE CS2PP22 OR TAKE CS2PP22NU OR TAKE CS2PJ20 (Compulsory)
Co-requisite module(s):
Pre-requisite or Co-requisite module(s):
Module(s) excluded:
Placement information: NA
Academic year: 2025/6
Available to visiting students: Yes
Talis reading list: Yes
Last updated: 12 May 2025
Overview
Module aims and purpose
The aim of this module is to introduce the field of text mining and natural language processing. A key focus of the module is placed on the theories and practice of processing text data from the aspects of lexicons, syntactics, and semantics.  Â
This module also encourages students to develop a set of professional skills, such as problem solving, critical thinking, scientifical evaluation, creativity, technical report writing, organization and time management, self-reflection. Â
Module learning outcomes
By the end of the module, it is expected that students will be able to:Â
- Understand and apply the fundamental principles of text mining and natural language processing;
- Apply methods and algorithms to process different types of textual data;
- Empirically evaluate the performances of methods and algorithms by using accuracy and efficiency metrics; and
- Apply analytical and programming skills through using the existing NLP methods and tool s such as NLTK and scikit-learn (python).
- Understand ethics in NLP, in particular issues in large language models.
Module content
The module covers the following topics:Â
- Regular expression, Text Normalization
- N-gram and language model, part-of-speech tagging lexical semantics, Word Senses and WordNet Syntactic and Semantic parsing
- Text classification, sentiment analysis
- Information extraction including name entity recognition and relation extraction
- Advanced topics:Â Machine learning for NLP, Word embedding, Hidden Markov model and Viterbi algorithm, , chatbots, Large Language Models, ethics in NLP
Structure
Teaching and learning methods
The lectures will introduce students the theories, concepts and underpinning principles specified in the indicative content. Students will be supervised in the practical sessions to apply the concepts and principles to given problems context for learning.
The lectures and practical sessions will enable students to practice a known NLP software, perform analysis and report writing.
There will also be learning materials in digital forms when they are required to support learning.
There are two types of assessment (i.e., formative assessment and summative assessment) which will support and reinforce students’ learning. Formative assessment is carried out through weekly learning activities either exemplar questions, or sample programmable problems.
Summative assessment consists of one piece of written coursework assignment and one written examination. The written coursework assignment requires students to demonstrate scientific writing of individual report. Appropriate feedback will be timely communicated with students for enhancing learning.
Study hours
At least 40 hours of scheduled teaching and learning activities will be delivered in person, with the remaining hours for scheduled and self-scheduled teaching and learning activities delivered either in person or online. You will receive further details about how these hours will be delivered before the start of the module.