BiliBili
Lecture 1 - Introduction to Unleashing Novel Data at Scale
- Why data accessibility is so central to the advancement of knowledge in economics (with some historical background)
- An overview of the data curation pipeline
- Step 1: Detect document layouts
- Step 2: OCR
- Step 3: Post-processing and database assembly
- Step 4: Convert information into computable format
- Why is the material covered in this course useful to social
scientists?
- Why there won’t be an app/commercial product capable of end-to-end processing of social science documents anytime soon
- Why manual data entry often falls short
- Why our problems differ from those that are the central focus of computer science and the digital humanities
- At its core, deep learning is an optimization problem, which economists are well-trained to understand. It would be really unfortunate if we did not take full advantage of the very powerful methods that deep learning offers, which we are well poised to utilize
Lecture 2 - Why Deep Learning?
This post compares rule-based and deep learning-based approaches to data curation. It discusses their requirements and why rule-based approaches often (but not always) produce disappointing results when applied to social science data.
- An overview of the syllabus (ultimately, the course had a few deviations from the original syllabus, based on student interests; final syllabus posted in the course section of this website)
- There are two distinct approaches to automated data curation
- Tell the computer how to process the data by defining a set of rules
- Let the computer learn how to process the data from empirical examples, using deep learning
- Overview of rules, how they are used to process image scans and text, why they often fail, and why they sometimes succeed
- Deep learning, how it contrasts with rule-based approaches, and its requirements
- Does the noise from rule-based approaches really matter?
Lecture 4 - Convolutional Neural Networks
- A brief overview of convolutions
- Benchmark datasets for image classification (following the ConvNent literature requires familiarity with the benchmarks)
- Image classification with a linear classifier (and its shortcomings)
- CNN architectures
- AlexNet
- VGG
- GoogLeNet
- ResNet
- ResNext
Lecture 5 - Image Classification; Training Neural Nets
This post covers two topics: using CNNs for image classification (a very useful task) and training neural networks in practice. Much of the information about training neural nets is essential to implementing deep learning-based approaches, whether with CNNs or with some other architecture.
Image Classification
- Loss functions for classification
- SVM
- Softmax
- Deep document classification
Training Neural Nets
- Activation functions
- Data pre-processing
- Initialization
- Optimization
- Regularization
- Batch normalization
- Dropout
- Data augmentation
- Transfer learning
- Setting hyperparameters
- Monitoring the learning process
Lecture 6 - Other Computer Vision Problems (Including Object Detection)
This post covers object detection as well as the related problems of semantic segmentation, localization, and instance segmentation. Object detection is core to document image analysis, as it is used to determine the coordinates and classes of different document layout regions. The other problems covered are closely related.
- Semantic segmentation
- Localization
- Object detection
- Region CNNs
- Fast R-CNN
- Faster R-CNN
- Mask R-CNN
- Features pyramids
- Instance segmentation
- Other frameworks (YOLO)
Lecture 7 - Object Detection in Practice
- Selecting an object detection model
- Overview of Detectron2
- How-to in D2
Lecture 8 - Labeling and Deep Visualization
Labeling
- Active learning for layout annotation
- Labeling hacks
Deep visualization
- Basic visualization approaches
- Gradient based ascent
- Deep Dream
Lecture 9 - Generative Adversarial Networks
- Overview: supervised and unsupervised learning; generative models
- Generative adversarial networks
- CycleGAN
Lecture 10 - OCR Architecture
- Overview of the OCR problem
- Recurrent neural networks
- LSTMs
- Connectionist temporal classification
- Putting it together
Lecture 11 - OCR and Post-Processing in Practice
This post discusses OCR, both off-the-shelf and how to implement a customized OCR model. It discusses how Layout Parser can be used for end-to-end document image analysis, and provides concrete examples of creating variable domains during post-processing. It also provides an overview of the second half of the knowledge base, which covers NLP.
Off-the-shelf OCR
Designing customized OCR
Putting it altogether (and Layout Parser)
Creating variable domains
An overview of the second half of the course (NLP)
Lecture 12 - Models of Words
Traditional models of words
Word2Vec
GloVe
Evaluation
Interpreting word vectors
Problems with word vectors
Lecture 13 - Language Modeling and Other Topics in NLP
This post provides an introduction to language modeling, as well as several other important topics: dependency parsing, named entity recognition (NER), and labeling for NLP. Due to time constraints, the course is able to provide only a very brief introduction to topics like dependency parsing and NER, which have traditionally been quite central questions in NLP research.
- Language Modeling
- Count based models
- Bag of words
- RNN (review)
- LSTM (review)
- Dependency parsing
- Named entity recognition
- Labeling for NLP
Lecture 14 - Seq2Seq and Machine Translation
Machine translation has pioneered some of the most productive innovations in neural-based NLP and hence is useful to study even for those who care little about machine translation per se. We will focus in particular on seq2seq and attention.
- Statistical machine translation
- Neural machine translation
Lecture 15 - Attention is All You Need
This post introduces the Transformer, a seq2seq model based entirely on attention that has transformed NLP. Given the importance of this paper, there are a bunch of very well-done web resources about it, cited in the lecture and below, that I recommend checking out directly (there are others who have much more of a comparative advantage in presenting seminal NLP papers than I do!).
A recap of attention
The Transformer
- The encoder
- Encoder self-attention
- Positional embeddings
- Add and normalize
- The decoder
- Encoder-decoder attention
- Decoder self-attention
- Linear and softmax layers
- Training
- The encoder
Lecture 16 - Transformer-Based Language Models
This post provides an overview of various Transformer-based language models, discussing their architectures and which are best-suited for different contexts.
- Overview
- Contextualized word embeddings
- Models
- GPT
- BERT
- RoBERTa
- DistilBERT
- ALBERT
- T5
- GPT2/GPT3
- Transformers XL
- XLNet
- Longformer
- BigBird
- Recap and what to use
Lecture 17 - Understanding Transformers, Visualization, and Sentiment Analysis
This post covers a variety of topics around Transformer-based language models: understanding how Transformer attention works, understanding what information is contained in their embeddings, visualizing embeddings, and using Transformer-based models to conduct sentiment analysis.
What do Transformer-based models attend to?
What’s in an embedding?
Visualizing embeddings
Sentiment analysis
Lecture 18 - NLP with Noisy Text
- The Canonical Deep NLP Training Corpus
- A definition of noise
- The problem with noise
- Approaches for denoising
Lecture 19 - Retrieval and Question Answering
- Reading comprehension
- Open-domain question answering
Lecture 20 - Zero-Shot and Few-Shot Learning in NLP
What it means to learn in just a few shots
Zero-shot and few-shot learning in practice
Lecture 21 - Transformers for Computer Vision
- Transformers for computer vision
- Transformers for image classification
- Transformers for object detection