Glossary¶
Applied Natural Language Processing (NLP)¶
Applied Natural Language Processing (NLP) uses computational techniques to analyze and work with human language. Common tasks include:
- extracting information from text
- classifying documents
- identifying topics or sentiment
- summarizing or transforming text
Web Mining¶
Web mining is the practice of collecting and analyzing data from the web.
Typical activities include:
- retrieving web pages or APIs
- extracting structured information from HTML
- analyzing links, text, or metadata
- building datasets for further analysis
Project¶
A structured set of files and folders that work together to run code, store data, and produce outputs.
Repository (repo)¶
A version-controlled project folder (often hosted on GitHub) that tracks changes over time.
Working Directory¶
The folder your terminal is currently operating in.
Path¶
An address to a file or folder (example: data/input.csv).
Dataset¶
A collection of data records used by a program.
Artifact¶
A file produced by the program (for example results or reports).
Logging¶
Messages written by the program that record what it is doing.
README.md¶
The front-page document that explains what a project is and how to run it.