NLP is uncommon
Introduction: This project is a study notes and materials prepared by natural language processing (NLP) interview based on personal interviews and experiences. This material currently contains the accumulation of interview questions in various fields of natural language processing.

> NLP interview and exchange group (Note: If you are full, you can add the editor wx: yzyykm666 to join the group!)

4. Common interviews for NLP learning algorithms
4.1 Common interviews for information extraction
4.1.1 Common interviews for naming entity recognition
- Hidden Markov Algorithm HMM Common Interviews
- 1. Introduction to basic information
- 1.1 What is a probability graph model?
- 1.2 What is random airport?
- 2. Introduction to Markov process
- 2.1 What is the Markov process?
- 2.2 What is the core idea of the Markov process?
- III. Hidden Markov Algorithm
- 3.1 Introduction to Hidden Markov Algorithm
- 3.1.1 What is the Hidden Markov algorithm?
- 3.1.2 What are the two sequences in the Hidden Markov algorithm?
- 3.1.3 What are the three matrices in the Hidden Markov algorithm?
- 3.1.4 What are the two assumptions in the Hidden Markov algorithm?
- 3.1.5 What is the workflow in the Hidden Markov algorithm?
- 3.2 Computation process of Hidden Markov Algorithm Model
- 3.2.1 What is the training process of Hidden Markov algorithm learning?
- 3.2.2 What is the sequence annotation (decoding) process of Hidden Markov algorithm?
- 3.2.3 What is the sequence probability process of Hidden Markov algorithm?
- 3.3 Hidden Markov Algorithm Problem
Click to view the answer
- Maximum Entropy Markov Model MEMM Common Interviews
- 4. Maximum entropy Markov Model (MEMM)
- 4.1 Maximum Entropy Markov Model (MEMM) Motivation
- 4.1.1 What are the problems with HMM?
- 4.2 Introduction to the Maximum Entropy Markov Model (MEMM)
- 4.2.1 What does the Maximum Entropy Markov Model (MEMM) look like?
- 4.2.2 Maximum entropy Markov model (MEMM) How to solve the HMM problem?
- 4.3 Maximum entropy Markov model (MEMM) problem
Click to view the answer
- Conditional Random Airport (CRF) Common Interviews
- 5. Conditional Random Field (CRF)
- 5.1 CRF Motive
- 5.1.1 What are the problems with HMM and MEMM?
- 5.2 CRF Introduction
- 5.2.1 What is CRF?
- 5.2.2 What is the main idea of CRF?
- 5.2.3 What is the definition of CRF?
- 5.2.4 What is the process of CRF?
- 5.3 CRF Pros and Cons
- 5.3.1 What are the advantages of CRF?
- 5.3.2 What are the disadvantages of CRF?
- 5.4 CRF reproduction?
- 6. Comparison
- 6.1 What is the difference between CRF model and HMM and MEMM model?
Click to view the answer
- DNN-CRF Common Interviews
- 1. Basic information
- 1.1 What are the evaluation indicators for naming entity recognition?
- 2. Traditional naming entity recognition method
- 2.1 What is the rule-based method of naming entity recognition?
- 2.2 What is the naming entity recognition method based on unsupervised learning?
- 2.3 What is the named entity recognition method based on feature-based supervised learning?
- 3. Naming entity recognition method based on deep learning
- 3.1 What are the advantages of named entity recognition methods based on deep learning compared to named entity recognition methods based on machine learning?
- 3.2 What is the structure of the named entity recognition method based on deep learning?
- 3.3 What is a distributed input layer and what are the methods?
- 3.4 Text Encoder
- 3.4.1 BiLSTM-CRF
- 3.4.1.1 What is BiLSTM-CRF?
- 3.4.1.2 Why use BiLSTM?
- 3.4.2 IDCNN-CRF
- 3.4.2.1 What is Dilated CNN?
- 3.4.2.2 Why is there a Dilated CNN?
- 3.4.2.3 What are the advantages of Dilated CNN?
- 3.4.2.4 Introduction to IDCNN-CRF
- 3.5 Tag Decoder
- 3.5.1 What is the tag decoder?
- 3.5.2 Introduction to MLP+softmax layer?
- 3.5.3 Introduction to the conditional random field CRF layer?
- 3.5.4 Introduction to the RNN layer of recurrent neural network?
- 3.5.3 Introduction to pointer network layer?
- 4. Comparison
- 4.1 CNN-CRF vs BiLSTM-CRF vs IDCNN-CRF?
- 4.2 Why do DNN need to add CRF?
- 4.3 CRF in TensorFlow VS CRF in discrete toolkit?
Click to view the answer
- Chinese field NER Common Interviews
- 1. Motive
- 1.1 What is the difference between Chinese named entity recognition and English named entity recognition?
- 2. Vocabulary enhancement
- 2.1 What is vocabulary enhancement?
- 2.2 Why is the "vocabulary enhancement" method effective for Chinese NER tasks?
- 2.3 What are the methods of vocabulary enhancement?
- 2.4 Dynamic Architecture
- 2.4.1 What is Dynamic Architecture?
- 2.4.2 What are the common methods?
- 2.4.3 What is Lattice LSTM and what are the problems?
- 2.4.4 What is FLAT and what are the problems?
- 2.5 Adaptive Embedding Paradigm
- 2.5.1 What is the Adaptive Embedding paradigm?
- 2.5.2 What are the common methods?
- 2.5.3 What is WC-LSTM and what are the problems?
- 3. Vocabulary/entity type information enhancement
- 3.1 What is vocabulary/entity type information enhancement?
- 3.2 Why is the "vocabulary/entity type information enhancement" method effective for Chinese NER tasks?
- 3.3 What are the methods for enhancing vocabulary/entity type information?
- 3.4 What is LEX-BERT?
Click to view the answer
- Named entity recognition trick Common interviews
- trick 1: Domain Dictionary Matching
- trick 2: Rule extraction
- trick 3: Word vector selection: word vector or word vector?
- trick 4: How to choose a feature extractor?
- trick 5: How to deal with a distinguished name?
- trick 6: How to deal with insufficient labeling data?
- trick 7: How to deal with nested named entity recognition
- 7.1 What is Entity Nesting?
- 7.2 Differences from traditional named entity recognition tasks
- 7.3 Solution:
- 7.3.1 Method 1: Sequence annotation
- 7.3.2 Method 2: Pointer Annotation
- 7.3.3 Method 3: Long header annotation
- 7.3.4 Method 4: Fragment arrangement
- trick 8: Why is the "vocabulary enhancement" method effective for Chinese NER tasks?
- trick 9: What should I do if the NER entity span is too long?
- trick 10: NER labeling data noise problem?
- trick 11: Given two named entity recognition tasks, one task has enough data and the other has very little data. What can I do?
- trick 12: Is the problem of NER labeling data imbalance?
Click to view the answer
4.1.2 Common interviews for relationship drawing
- Relationship drawing common interviews
- 1. Motive
- 1.1 What is relationship extraction?
- 1.2 What are the types of relationship extraction techniques?
- 1.3 How do common relationship extraction processes be done?
- 2. Classic relationship extraction
- 2.1 What does the template matching method refer to? What are the pros and cons?
- 2.2 What does the remote supervision relationship extraction refer to? What are its pros and cons?
- 2.3 What is relationship overlap? Complex relationship issues?
- 2.4 What is joint extraction? What are the difficulties?
- 2.5 What are the overall methods of joint extraction? What are their shortcomings?
- 2.6 Introduction to the joint extraction method based on shared parameters?
- 2.7 Introduction to joint decoding based on joint decoding?
- 2.8 What are the cutting-edge technologies and challenges in entity relationship extraction? How to solve the extraction of entity relationships under low resources and complex samples?
- 3. Document-level relationship extraction
- 3.1 What is the difference between document-level relationship extraction and classic relationship extraction?
- 3.2 What problems are faced in document-level relationship extraction?
- 3.3 What are the methods for document-level relationship extraction?
- 3.3.1 How is document relationship extraction based on BERT-like done?
- 3.3.2 How is graph-based document relationship extraction done?
- 3.4 What are the common data sets for document-level relationship extraction and their evaluation methods?
Click to view the answer
4.1.3 Event extraction common interviews
- Event drawing common interviews
- 1. Principles
- 1.1 What is an event?
- 1.2 What is event extraction?
- 1.3 What are the basic terms and tasks involved in event extraction in ACE assessment?
- 1.4 How does event extraction develop?
- 1.5 What are the problems with event extraction?
- 2. Basic tasks
- 2.1 Trigger word detection
- 2.1.1 What is trigger word detection?
- 2.1.2 What are the methods for trigger word detection?
- 2.2 Type Identification
- 2.2.1 What is type recognition?
- 2.2.2 What are the methods of type identification?
- 2.3 Role recognition
- 2.3.1 What is role recognition?
- 2.3.2 What are the methods of role recognition?
- 2.4 Argument detection
- 2.4.1 What is argument detection?
- 2.4.2 What are the methods of argument detection?
- 3. Common methods
- 3.1 How to use pattern matching method in event extraction?
- 3.2 How do statistical machine learning methods be used in event extraction?
- 3.3 How do deep learning methods be used in event extraction?
- IV. Datasets and evaluation indicators
- 4.1 What are the common English data sets in event extraction?
- 4.2 What are the common Chinese data sets in event extraction?
- 4.3 What are the evaluation indicators for event extraction? How to calculate it?
- 5. Comparison
- 5.1 What are the similarities and differences between event extraction and named entity recognition (i.e. entity extraction)?
- 5.2 What are the similarities and differences between event extraction and relationship extraction?
- 5.3 What is a matter-of-fact map? What are the event relationship types? How to build a rational map? What are the main technical fields and current development hotspots?
- 6. Application
- 7. Expansion
- 7.1 Summary of Event Extraction Papers
- 7.2 Event extraction FAQ
4.2 Common interviews for NLP pre-training algorithms
- 【About TF-idf】Things you don't know
- 1. One-hot
- 1.1 Why is there one-hot?
- 1.2 What is one-hot?
- 1.3 What are the characteristics of one-hot?
- 1.4 What are the problems with one-hot?
- 2. TF-IDF
- 2.1 What is TF-IDF?
- 2.2 How does TF-IDF evaluate the importance of words?
- 2.3 What is the idea of TF-IDF?
- 2.4 What is the calculation formula for TF-IDF?
- 2.5 How to describe TF-IDF?
- 2.6 What are the advantages of TF-IDF?
- 2.7 What are the disadvantages of TF-IDF?
- 2.8 Application of TF-IDF?
Click to view the answer
- 【About word2vec】Things you don't know
- 1. Introduction to Wordvec
- 1.1 What does Wordvec mean?
- 1.2 What does CBOW in Wordvec mean?
- 1.3 What does Skip-gram mean in Wordvec?
- 1.4 Which one is better for CBOW vs Skip-gram?
- 2. Wordvec optimization article
- 2.1 What is the Hoffman tree in Word2vec?
- 2.2 Why do you need to use Hoffman tree in Word2vec?
- 2.3 What are the benefits of using Hoffman trees in Word2vec?
- 2.4 Why is negative sampling used in Word2vec?
- 2.5 What is negative sampling like in Word2vec?
- 2.6 What is the sampling method of negative sampling in Word2vec?
- 3. Wordvec comparison
- 3.1 What is the difference between word2vec and NNLM? (word2vec vs NNLM)
- 3.2 What is the difference between word2vec and tf-idf in similarity calculation?
- 4. Word2vec practical chapter
- 4.1 Word2vec training trick, how big is the window setting?
- 4.1 Word2vec training trick, word vector latitude, what are the effects of big and small, and other parameters?
Click to view the answer
- 【About FastText】Things you don't know
- 1. FastText motivation
- 1.1 What is word-level model?
- 1.2 What are the problems with word-level Model?
- 1.3 What is Character-Level Model?
- 1.4 Character-Level Model Advantages?
- 1.5 Is there any problem with Character-Level Model?
- 1.6 Solution to Character-Level Model Problem?
- 2. Introduction to n-gram information in words (subword n-gram information)
- 2.1 Introduction
- 2.2 What is fastText?
- 2.3 What is the structure of fastText?
- 2.4 Why does fastText use n-gram information in the word (subword n-gram information)?
- 2.5 Introduction to the n-gram information in the fastText word (subword n-gram information)?
- 2.6 The training process of n-gram information in fastText word?
- 2.7 Is there any problem with the n-gram information in the fastText word?
- 3. Introduction to Hierarchical Softmax Regression (Hierarchical Softmax)
- 3.1 Why use Hierarchical Softmax regression?
- 3.2 What is the idea of Hierarchical Softmax regression?
- 3.3 What are the steps to hierarchical Softmax regression?
- 4. Is there any problem with fastText?
Click to view the answer
- 【About Elmo】Things you don't know
- 1. Elmo motivation
- 2. Elmo Introduction
- 2.1 What are the features of Elmo?
- 2.2 What is Elmo's thought?
- 3. Elmo Questions
- 3.1 What are the problems with Elmo?
Click to view the answer
4.3 Bert Common Interviews
- Bert Common Interviews
- 1. Motive
- 1.1 [Evolution History] Is there any problem with one-hot?
- 1.2 [Evolution History] There is a problem with wordvec?
- 1.3 [Evolution History] Is there any problem with fastText?
- 1.4 [Evolution History] Is there any problem with elmo?
- 2. Bert
- 2.1 Bert Introduction
- 2.1.1【BERT】What is Bert?
- 2.1.2【BERT】Bert three key points?
- 2.2 Bert Input and Output Characterization
- 2.2.1 [BERT] What does the Bert input and output characterization look like?
- 2.3 【BERT】Bert Pre-training
- 2.3.1 【BERT】Bert pre-training tasks introduction
- 2.3.2 【BERT】Bert Pre-training Task Masked LM Chapter
- 2.3.2.1 【BERT】 Why does Bert need pre-training tasks Masked LM?
- 2.3.2.2 【BERT】 How to do the Bert pre-training task Masked LM?
- 2.3.2.3 【BERT】 Is there any problem with Bert pre-training task Masked LM?
- 2.3.2.4 【BERT】 Solution to the mismatch between pre-training and fine-tuning?
- 2.3.3 【BERT】Bert Pre-training Task Next Sentence Prediction
- 2.3.3.1 [BERT] Why does Bert need pre-training tasks Next Sentence Prediction?
- 2.3.3.2 【BERT】 How to do Next Sentence Prediction in Bert pre-training task?
- 2.4 【BERT】 fine-turning article?
- 2.4.1 【BERT】Why does Bert need fine-turning?
- 2.4.2 【BERT】 How to fine-turn in Bert?
- 2.5 【BERT】 Bert Loss Functions?
- 2.5.1 [BERT] What is the loss function corresponding to the two pre-training tasks of BERT (expressed in formula form)?
- 3. Comparison?
- 3.1 [Contrast] What is the problem of polysynthetics?
- 3.2 [Comparison] Why can't word2vec solve the problem of polysynonyms?
- 3.3 [Comparison] What is the difference between GPT and BERT?
- 3.4 [Comparison] Why can elmo, GPT, and Bert solve the problem of polysynonyms? (Take elmo as an example)
Click to view the answer
- 【About Bert Source Code Analysis I's main body] Things you don't know
- 【About Bert Source Code Analysis II Pre-training Chapter】Things you don't know
- 【About Bert Source Code Analysis III fine-tuning chapter】Things you don't know
- [About Bert source code analysis IV sentence vector generation article] Things you don't know
- [About Bert Source Code Analysis V Text Similarity Chapter] Things you don't know
4.3.1 Common interviews for Bert model compression
- Bert model compression common interview
- 1. Bert model compression motivation
- 2. Bert model compression comparison table
- 3. Introduction to Bert model compression method
- 3.1 Low-rank factorization and cross-layer parameter sharing of Bert model compression method
- 3.1.1 What is low-rank factorization?
- 3.1.2 What is cross-layer parameter sharing?
- 3.1.3 The method used by ALBERT?
- 3.2 Bert model compression method distillation
- 3.2.1 What is distillation?
- 3.2.2 What papers are there using model distillation? Let me introduce it briefly?
- 3.3 Quantification of Bert model compression method
- 3.3.1 What is quantification?
- 3.3.2 Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT 【Quantification】
- 3.4 Bert model compression method pruning
- 4. Is there any problem with model compression?
Click to view the answer
4.3.2 Common interviews for Bert Model Series
- Do you know XLNet? Can you tell me? What's the difference between Bert?
- Do you know RoBERTa? Can you tell me? What's the difference between Bert?
- Do you know SpanBERT? Can you tell me? What's the difference between Bert?
- Do you know MASS? Can you tell me? What's the difference between Bert?
Click to view the answer
4.4 Common interviews for text classification
- Common interviews for text classification
- 1. Abstract Propositions
- 1.1 What are the categories of classification tasks? What are their characteristics?
- 1.2 What are the differences between text classification tasks compared with classification tasks in other fields?
- 1.3 What is the difference between text classification tasks and other tasks in the text field?
- 1.4 The process of text classification?
- 2. Data preprocessing
- 2.1 What are the data preprocessing methods for text classification tasks?
- 2.2 What word participle methods and tools have you used?
- 2.3 How to participle Chinese texts?
- 2.4 What is the principle of word segmentation method based on string matching?
- 2.5 How do statistical language models be applied to word participle? N-gram maximum probability participle?
- 2.6 What is the word segmentation method based on sequence annotation?
- 2.7 What is the part-of-speech annotation based on (Bi-)LSTM?
- 2.8 What is the difference between stem extraction and word shape restoration?
- 3. Feature extraction
- 3.1 (A specific) What characteristics can be used in the text classification task?
- 3.2 (For Western texts) What is the difference between using words and using letters as characteristics?
- 3.3 Can you briefly introduce the bag of words model?
- 3.4 n-gram
- 3.4.1 What is n-method syntax? Why use n-gram?
- 3.4.2 What are the limitations of the n-gram algorithm?
- 3.5 Topic Modeling
- 3.5.1 Introduction to the topic modeling task?
- 3.5.2 Common methods of topic modeling
- 3.5.3 What does the TF-IDF algorithm do? A brief introduction to the TF-IDF algorithm
- 3.5.4 What does tf-idf high mean?
- 3.5.5 The shortcomings of tf-idf
- 3.6 Text Similarity
- 3.6.1 How to calculate the distance between two paragraphs of text?
- 3.6.2 What is jaccard distance?
- 3.6.3 What is the difference between Dice coefficient and Jaccard coefficient?
- 3.6.4 The same is editing distance, what is the difference between Levinstein distance and Hamming distance?
- 3.6.5 Write a programming question about calculating the editing distance (Lewinstein distance)?
- 4. Model
- 4.1 fastText
- 4.1.1 The classification process of fastText?
- 4.1.2 What are the advantages of fastText?
- 4.2 TextCNN
- 4.2.1 The process of TextCNN performing text classification?
- 4.2.2 What parameters can TextCNN adjust?
- 4.2.3 When using CNN as a text classifier, what information does different channels correspond to text?
- 4.2.4 What does the length and width of the convolution kernel in TextCNN represent?
- 4.2.5 What is the difference between pooling operations in TextCNN and pooling operations in general CNN?
- 4.2.6 Limitations of TextCNN?
- 4.3 DPCNN
- 4.3.1 How to solve the long text classification task?
- 4.3.2 Briefly introduce the improvements of the DPCNN model compared to TextCNN?
- 4.4 TextRCNN
- 4.4.1 Briefly introduce the improvements of TextRCNN compared to TextCNN?
- 4.5 RNN+Attention
- 4.5.1 The idea of RNN+Attention for text classification tasks, and why does the Attention/attention mechanism need to be added?
- 4.6 GNN Graph Neural Network
- 4.6.1 How does GNN graph neural network be applied to the field of text classification?
- 4.7 Transformer
- 4.7.1 How to apply pre-trained models based on Transformer to the field of text classification?
- 4.8 Pre-trained model
- 4.8.1 What pre-trained models do you know? What are their characteristics?
- V. Loss function
- 5.1 Activation function sigmoid
- 5.1.1 Introduction to the activation function sigmoid used for binary classification problems?
- 5.1.2 What are the disadvantages of Sigmod?
- 5.2 Activation function softmax
- 5.2.1 What is the softmax function?
- 5.2.2 How to find the derivative of softmax function?
- 5.3 What other loss functions are used for classification problems?
- 6. Model evaluation and algorithm comparison
- 6.1 What are the evaluation algorithms and indicators used in text classification tasks?
- 6.2 Brief introduction to confusion matrix and kappa?
Click to view the answer
- Text classification trick Common interviews
- 1. How to preprocess text classification data?
- 2. How to choose a text classification pre-trained model?
- 3. How to optimize text classification parameters?
- 4. What are the difficult tasks of text classification?
- 5. Construction of text classification labeling system?
- 6. Construction of text classification strategy?
Click to view the answer
- Use search methods to do common interviews for text classification
- Why do we need to use search to classify text?
- What is the idea of text classification based on search methods?
- How to build a recall library for the retrieved method?
- How to do the training stage of the search method?
- How to do the prediction stage of the search method?
- What are the applicable scenarios for text classification using search methods?
Click to view the answer
4.5 Common interviews for text matching
- Text Matching Model ESIM Common Interviews
- Why do you need ESIM?
- What about introducing the ESIM model?
Click to view the answer
- Common interviews for BERT in semantic similarity matching tasks
- 1. Sentence Pair Classification Task: Use CLS
- 2. cosine similarity
- 3. The difference between long and short texts
- 4. sentence/word embedding
- 5. Siamese network method
Click to view the answer
4.6 Common interviews for Q&A system
4.6.1 Common interviews for FAQ search-based Q&A system
- 1. Motivation
- 1.1 Motivation of the Q&A system?
- 1.2 What is the Q&A system?
- 2. Introduction to FAQ search-based Q&A system
- 2.1 What is the FAQ search-based question-and-answer system?
- 2.2 What is the core of query matching standard QA?
- 3. FAQ search-based Q&A system solution
- 3.1 What are the commonly used solutions?
- 3.2 Why is QQ matching more commonly used?
- 3.2.1 What are the advantages of QQ matching?
- 3.2.2 What is the semantic space for QQ matching?
- 3.2.3 What is the stability of the QQ matching corpus?
- 3.2.4 What is the decoupling of QQ matching business answers and algorithm model?
- 3.2.5 What are the discovery and deduplication of new QQ matching problems?
- 3.2.6 What is the online running speed of QQ matching?
- 3.3 What is the general processing process for QQ matching? [Assuming the standard problem bank has been processed]
- 4. Construction of FAQ standard problem bank
- 4.1 How to find standard problems in FAQ?
- 4.2 How to split FAQ?
- 4.3 How to merge FAQ?
- 4.4 How to update the FAQ standard library in real time?
- 5. FAQ Standard Question Bank Answer Optimization
- 5.1 How to optimize the answers to the FAQ standard question bank?
Click to view the answer
4.6.2 Q&A System Tools Common Interviews
- Faiss Common Interviews
- 1. Motive
- 1.1 What are the problems with traditional similarity algorithms?
- 2. Introduction
- 2.1 What is Faiss?
- 2.2 How to use Faiss?
- 2.3 Faiss principle and core algorithm
- 3. Faiss Practical Chapter
- 3.1 How to install Faiss?
- 3.2 What are the index indexes of Faiss?
- 3.3 How to use Faiss' index index?
- 3.3.1 Data preparation
- 3.3.2 Violent Aesthetics IndexFlatL2
- 3.3.3 The Flash IndexIVFlat
- 3.3.4 Memory Manager IndexIVFPQ
- 3.4 Faiss Then use GPU?
- 4. Faiss comparison
- 4.1 Which one is better, sklearn cosine_similarity or Faiss
4.7 Common interviews for dialogue system
- Common interviews for dialogue system
- 1. Introduction to the dialogue system
- 1.1 What are the dialogue systems?
- 1.2 What are the differences between these dialogue systems?
- 2. Introduction to the multi-round dialogue system
- 2.1 Why use a multi-round dialogue system?
- 2.2 What are the common multi-round dialogue system solutions?
- 3. Introduction to the task-based dialogue system
- 3.1 What is a task-based dialogue system?
- 3.2 What is the process of a task-based dialogue system?
- 3.3 Task-based dialogue system language understanding (SLU)
- 3.3.1 What is Language Understanding (SLU)?
- 3.3.2 What are the input and output of Language Understanding (SLU)?
- 3.3.3 What are the techniques used in Language Understanding (SLU)?
- 3.4 Task-based dialogue system DST (dialogue status tracking)
- 3.4.1 What is DST (Dialogue Status Tracking)?
- 3.4.2 What are the input and output of DST (dialogue status tracking)?
- 3.4.3 Do DST (Dialogue Status Tracking) have problems and solutions?
- 3.4.4 What is the implementation method of DST (dialogue status tracking)?
- 3.5 DPO (Dialogue Strategy Learning) Chapter of Task-based Dialogue System
- 3.5.1 What is DPO (Dialogue Strategy Learning)?
- 3.5.2 What are the input and output of DPO (dialogue strategy learning)?
- 3.5.3 What is the implementation method of DPO (dialogue strategy learning)?
- 3.6 NLG (natural language generation) task-based dialogue system
- 3.6.1 What is NLG (natural language generation)?
- 3.6.2 What are the input and output of NLG (natural language generation)?
- 3.6.3 Is NLG (natural language generation) implementation?
Click to view the answer
4.8 Common interviews for knowledge graphs
4.8.1 Common interviews for knowledge graphs
- 1. Introduction to the knowledge graph
- 1.1 Introduction
- 1.2 What is a knowledge graph?
- 1.2.1 What is Graph?
- 1.2.2 What is Schema?
- 1.3 What are the categories of knowledge graphs?
- 1.4 What is the value of the knowledge graph?
- 2. How to build a knowledge graph?
- 2.1 Where does the data from the knowledge graph come from?
- 2.2 What are the difficulties in information extraction?
- 2.3 The technologies involved in building a knowledge graph?
- 2.4. What are the specific technology for building a knowledge graph?
- 2.4.1 Named Entity Recognition
- 2.4.2 Relation Extraction
- 2.4.3 Entity Resolution
- 2.4.4 Refers to Disambiguation
- 3. How to store knowledge graphs?
- 4. What can the knowledge graph do?
Click to view the answer
4.8.2 KBQA Common Interviews
- 1. Methods based on dictionary and rules
- Implement KBQA based on dictionary and rules?
- Implementing the KBQA process based on dictionary and rules?
- 2. Methods based on information extraction
- Implement the KBQA process based on information extraction?
Click to view the answer
4.8.3 Neo4j Common Interviews
- 1. Neo4J introduction and installation
- 1.1 Introduction
- 1.2 How to download Neo4J?
- 1.3 How to install Neo4J?
- 1.4 Introduction to Neo4J Web Interface
- 1.5 What is the Cypher query language?
- 2. Neo4J addition, deletion, search and modification
- 2.1 Introduction
- 2.2 How to create a node in Neo4j?
- 2.3 How to create a relationship in Neo4j?
- 2.4 How to create a birthplace relationship in Neo4j?
- 2.5 How to query Neo4j?
- 2.6 How to delete and modify Neo4j?
- 3. How to use Python to operate Neo4j graph database?
- 3.1 neo4j module: What is the execution of CQL (cypher) statement?
- 3.2 What is the py2neo module?
- 4. Data import Neo4j diagram database
Click to view the answer
4.9 Text Summary Common Interviews
- 1. Motive
- 1.1 What is a text summary?
- 1.2 What are the types of text summary techniques?
- 2. Extraction summary
- 2.1 How to make a decisive summary?
- 2.1.1 What are the sentence importance evaluation algorithms?
- 2.1.2 What are the constraint-based summary generation methods?
- 2.1.3 How does the TextTeaser algorithm extract the summary?
- 2.1.4 How does TextRank algorithm extract the summary?
- 2.2 What is the readability problem of extracted summary?
- 3. Compressed summary
- 3.1 How to make a compressed summary?
- 4. Generative summary
- 4.1 How to make a generative summary?
- 4.2 What are the problems with generative summary?
- 4.3 What problems does Pointer-generator network solve?
- V. Abstract quality assessment method
- 5.1 What are the types of abstract quality evaluation methods?
- 5.2 What is ROUGE?
- 5.3 What is the difference between several ROUGE indicators?
- 5.4 What is the difference between BLEU and ROUGE?
Click to view the answer
4.10 Text error correction article Common interview article
- 1. Introduction
- 1.1 What is text error correction?
- 1.2 Common text error types?
- 1.3 Common methods for text error correction?
- 2. Introduction to pipeline method
- How to implement error detection in pipeline?
- How to implement candidate recall in pipeline?
- How to implement error correction sorting in pipeline?
- How to implement ASR echo optimization in pipeline?
Click to view the answer
4.11 Text Summary Common Interviews
- 1. Motive
- 1.1 What is a text summary?
- 1.2 What are the types of text summary techniques?
- 2. Extraction summary
- 2.1 How to make a decisive summary?
- 2.1.1 What are the sentence importance evaluation algorithms?
- 2.1.2 What are the constraint-based summary generation methods?
- 2.1.3 How does the TextTeaser algorithm extract the summary?
- 2.1.4 How does TextRank algorithm extract the summary?
- 2.2 What is the readability problem of extracted summary?
- 3. Compressed summary
- 3.1 How to make a compressed summary?
- 4. Generative summary
- 4.1 How to make a generative summary?
- 4.2 What are the problems with generative summary?
- 4.3 What problems does Pointer-generator network solve?
- V. Abstract quality assessment method
- 5.1 What are the types of abstract quality evaluation methods?
- 5.2 What is ROUGE?
- 5.3 What is the difference between several ROUGE indicators?
- 5.4 What is the difference between BLEU and ROUGE?
Click to view the answer
4.12 Common interviews for text generation
- Common interviews for decoding methods for generating models
- What is a generative model?
- What are the search-based decoding methods?
- What are the sampling-based decoding methods?
Click to view the answer
3. Deep Learning Algorithm Common Interview
- CNN Common Interviews
- 1. Motive
- 2. CNN convolutional layer
- 2.1 What is the essence of a convolutional layer?
- 2.2 What is the connection between CNN convolutional layer and fully connected layer?
- 2.3 What does channel mean?
- 3. CNN pooling layer
- 3.1 What is the pooling layer for the region?
- 3.2 What are the types of pooling layers?
- 3.3 What is the function of the pooling layer?
- 3.4 What is backpropagation in the pooling layer like?
- 3.5 What is mean pooling backpropagation like?
- 3.6 What is backpropagation of the pooling layer like?
- 4. CNN overall
- 4.1 What is the process of CNN?
- 4.2 What are the characteristics of CNN?
- 4.3 Why does convolutional neural network have translational invariance?
- 4.4 How is im2col implemented in convolutional neural network?
- 4.5 What are the limitations of CNN?
- 5. Iterated Dilated CNN
- 5.1 What is Dilated CNN Void Convolution?
- 5.2 What is Iterated Dilated CNN?
- 6. Deconvolution
- 6.1 Explain the principles and uses of deconvolution?
Click to view the answer
- RNN Common Interviews
- 1. RNN
- 1.2 Why do I need RNN?
- 1.2 What is the RNN structure?
- 1.3 RNN forward calculation formula?
- 1.4 What are the problems with RNN?
- 2. Long Short Term Memory Network (LSTM)
- 2.1 Why do you need LSTM?
- 2.2 What is the structure of LSTM?
- 2.3 How does LSTM mitigate the problems of RNN gradient vanishing and gradient explosion?
- 2.3 What is the process of LSTM?
- 2.4 What are the differences in activation functions in LSTM?
- 2.5 LSTM complexity?
- 2.6 What problems do LSTM exist?
- 3. GRU (Gated Recurrent Unit)
- 3.1 Why do you need GRU?
- 3.2 What is the structure of GRU?
- 3.3 Forward calculation of GRU?
- 3.4 What is the difference between GRU and other RNN series models?
- 4. RNN series model
- 4.1 What are the characteristics of the RNN series model?
Click to view the answer
- Attention Common interviews
- 1. seq2seq
- 1.1 What is seq2seq (Encoder-Decoder)?
- 1.2 How about Encoder in seq2seq?
- 1.3 How about Decoder in seq2seq?
- 1.4 Do you know seq2seq from a mathematical perspective?
- 1.5 What problems do seq2seq have?
- 2. Attention
- 2.1 What is Attention?
- 2.2 Why is the Attention mechanism introduced?
- 2.3 What is the function of Attention?
- 2.4 What is the Attention process?
- Step 1 to execute encoder (consistent with seq2seq)
- Step 2 calculate the alignment coefficient a
- Step 3: Calculate the context semantic vector C
- Step 4 Update the decoder status
- Step 5 Calculate the output prediction words
- 2.5 What are the application areas of Attention?
- 3. Attention variant
- 3.1 What is Soft Attention?
- 3.2 What is Hard Attention?
- 3.3 What is Global Attention?
- 3.4 What is Local Attention?
- 3.5 What is self-attention?
Click to view the answer
- Generating adversarial network GAN Common interviews
- 1. Motivation
- 2. Introduction
- 2.1 Basic ideas of GAN
- 2.2 Basic introduction to GAN
- 2.2.1 Basic Structure of GAN
- 2.2.2 Basic ideas of GAN
- 3. Training
- 3.1 Introduction to the generator
- 3.2 Introduction to discriminator
- 3.3 Training Process
- 3.4 Related theoretical basis for training
- 4. Summary
Click to view the answer
3.1 Transformer Common Interviews
- Transformer Common Interviews
- 1. Motive
- 1.1 Why do you need a Transformer?
- 1.2 What is the function of Transformer?
- 2. Overall structure
- 2.1 What is the overall structure of Transformer?
- 2.2 What is the Transformer-encoder structure?
- 2.3 What is the Transformer-decoder structure?
- 3. Module
- 3.1 self-attention module
- 3.1.1 What is traditional attention?
- 3.1.2 Why is there a self-attention?
- 3.1.3 What is the core idea of self-attention?
- 3.1.4 What is the purpose of self-attention?
- 3.1.5 How to calculate self-attention?
- 3.1.6 self-attention Why are Q and K generated using different weight matrices, and why can't the same value be used to multiply their own point?
- 3.1.7 Why use self-attention of the dot product model instead of additive model?
- 3.1.8 Why do you need to divide by when calculating self-attention in Transformer $sqrt{d}$ ?
- 3.1.9 How to solve the problem of long-distance dependency?
- 3.1.10 How to parallelize self-attention?
- 3.2 multi-head attention module
- 3.2.1 What is the idea of multi-head attention?
- 3.2.2 What are the steps of multi-head attention?
- 3.2.3 Why does Transformer use the multi-head attention mechanism? (Why not use a header)
- 3.2.4 Why do we need to reduce the dimensionality of each head when doing bullish attention?
- 3.2.5 Multi-head attention code introduction
- 3.3 Position encoding module
- 3.3.1 Why do you need to add Position encoding?
- 3.3.2 What is the idea of position encoding?
- 3.3.3 What is the role of position encoding?
- 3.3.4 What are the steps for position encoding?
- 3.3.5 Why do you choose to add rather than splice when applying Position encoding?
- 3.3.6 What is the difference between Position encoding and Position embedding?
- 3.3.7 Why did the Transformer be proposed in 2017 use Position Encoder instead of Position Embedding? But Bert uses Position Embedding?
- 3.3.8 Code introduction of Position encoding
- 3.4 Residual Module
- 3.4.1 Why add residual module?
- 3.5 Layer normalization module
- 3.5.1 Why do you need to add the Layer normalization module?
- 3.5.2 What is the Layer normalization module?
- 3.5.3 What is the difference between Batch normalization and Layer normalization?
- 3.5.4 Why should we abandon Batch normalization and use Layer normalization in Transformer?
- 3.5.5 Layer normalization module code introduction
- 3.6 Mask module
- 3.6.1 What is Mask?
- 3.6.2 How many types of Masks are used in Transformer?
- 3.6.3 Can you introduce how many types of Masks are used in Transformer?
Click to view the answer
- 【About Transformer Questions and Improvements】Things you don't know
- 1. Transformer problem
- 1.1 Since Transformer is awesome, are there still some problems?
- 2. What is the solution to each problem?
- 2.1 Question 1: Transformer cannot handle the problem of excessive long inputs very well
- 2.1.1 Transformer Fixed the sentence length?
- 2.1.2 What is the purpose of the Transformer fixed sentence length?
- 2.1.3 How to deal with this problem with Transformer?
- 2.2 Question 2: The problem of the missing direction information and relative position of the Transformer
- 2.3 Question 3: Missing Recurrent Inductive Bias
- Question 4: Question 4: Transformer is non-Turing-complete: A popular understanding of non-Turing-complete is that it cannot solve all problems.
- Question 5: The transformer lacks conditional computing;
- Question 6: The time complexity and space complexity of transformer are too large;
5. NLP skills
5.1 Few samples problem areas
5.1.1 Data Enhancement (EDA) Interview
- 1. Motive
- 1.1 What is data augmentation?
- 1.2 Why is data augmentation required?
- 2. Common data enhancement methods
- 2.1 Vocabulary replacement
- 2.1.1 What is a dictionary-based replacement method?
- 2.1.2 What is a word vector-based replacement method?
- 2.1.3 What is an MLM-based alternative method?
- 2.1.4 What is TF-IDF-based word substitution?
- 2.2 Vocabulary Insertion
- 2.2.1 What is random insertion method?
- 2.3 Vocabulary Exchange
- 2.3.1 What is random exchange method?
- 2.4 Vocabulary Deletion
- 2.4.1 What is the random deletion method?
- 2.5 Reply to translation
- 2.5.1 What is back translation method?
- 2.6 Cross-enhanced
- 2.6.1 What is Cross-Enhancement
- 2.7 Syntax Tree
- 2.7.1 What is syntax tree operation?
- 2.8 Confrontation Enhancement
- 2.8.1 What is confrontation enhancement?
Click to view the answer
5.1.2 Active learning interview
- 1. Motive
- 1.1 What is active learning?
- 1.2 Why do you need to actively learn?
- 2. Active learning
- 2.1 What is the idea of active learning?
- 2.2 What are the value points of active learning methods?
- 3. Sample selection strategy
- 3.1 Dividing according to the differences in the acquisition methods of unlabeled samples
- 3.2 测试集内选取“信息”量最大的数据标记
- 3.2.1 测试集内选取“信息”量最大的数据标记
- 3.2.2 依赖不确定度的样本选取策略(Uncertainty Sampling, US)
- 3.2.3 基于委员会查询的方法(Query-By-Committee,QBC)
点击查看答案
5.1.3 数据增强之对抗训练面试篇
- 一、介绍篇
- 1.1 什么是对抗训练?
- 1.2 为什么对抗训练能够提高模型效果?
- 1.3 对抗训练有什么特点?
- 1.4 对抗训练的作用?
- 二、概念篇
- 2.1 对抗训练的基本概念?
- 2.2 如何计算扰动?
- 2.3 如何优化?
- 三、实战篇
- 3.1 NLP 中经典对抗训练之Fast Gradient Method(FGM)
- 3.2 NLP 中经典对抗训练之Projected Gradient Descent(PGD)
点击查看答案
5.2 “脏数据”处理面试篇
- 一、动机
- 1.1 何为“脏数据”?
- 1.2 “脏数据” 会带来什么后果?
- 二、“脏数据” 处理篇
- 2.1 “脏数据” 怎么处理呢?
- 2.2 置信学习方法篇
- 2.2.1 什么是置信学习方法?
- 2.2.2 置信学习方法优点?
- 2.2.3 置信学习方法怎么做?
- 2.2.4 置信学习方法怎么用?有什么开源框架?
- 2.2.5 置信学习方法的工作原理?
点击查看答案
5.3 batch_size设置面试篇
- 一、训练模型时,batch_size的设置,学习率的设置?
点击查看答案
5.4 早停法EarlyStopping 面试篇
- 一、 为什么要用早停法EarlyStopping?
- 二、 早停法EarlyStopping 是什么?
- 三、早停法torch 版本怎么实现?
点击查看答案
5.5 标签平滑法LabelSmoothing 面试篇
- 一、为什么要有标签平滑法LabelSmoothing?
- 二、 标签平滑法是什么?
- 三、 标签平滑法torch 怎么复现?
点击查看答案
5.6 Bert Trick 面试篇
5.6.1 Bert 未登录词处理面试篇
- 什么是Bert 未登录词?
- Bert 未登录词如何处理?
- Bert 未登录词各种处理方法有哪些优缺点?
点击查看答案
5.6.2 BERT在输入层引入额外特征面试篇
点击查看答案
5.6.3 关于BERT 继续预训练面试篇
- 什么是继续预训练?
- 为什么会存在【数据分布/领域差异】大问题?
- 如何进行继续预训练?
- 还有哪些待解决问题?
- 训练数据问题解决方案?
- 知识缺乏问题解决方案?
- 知识理解缺乏问题解决方案?
点击查看答案
5.6.4 BERT如何处理篇章级长文本面试篇
- 为什么Bert 不能处理长文本?
- BERT 有哪些处理篇章级长文本?
点击查看答案
六、 Prompt Tuning 面试篇
6.1 Prompt 面试篇
- 什么是prompt?
- 如何设计prompt?
- prompt进阶——如何自动学习prompt?
- Prompt 有哪些关键要点?
- Prompt 如何实现?
点击查看答案
6.2 Prompt 文本生成面试篇
- Prompt之文本生成评估手段有哪些?
- Prompt文本生成具体任务有哪些?
点击查看答案
6.3 LoRA 面试篇
- 什么是lora?
- lora 是怎么做的呢?
- lora 为什么可以这样做?
- 用一句话描述lora?
- lora 优点是什么?
- lora 缺点是什么?
- lora 如何实现?
点击查看答案
6.4 PEFT(State-of-the-art Parameter-Efficient Fine-Tuning)面试篇
- 一、微调Fine-tuning 篇
- 1.1 什么是微调Fine-tuning ?
- 1.2 微调Fine-tuning 基本思想是什么?
- 二、轻度微调(lightweight Fine-tuning)篇
- 2.1 什么是轻度微调(lightweight Fine-tuning)?
- 三、适配器微调(Adapter-tuning)篇
- 3.1 什么是适配器微调(Adapter-tuning)?
- 3.2 适配器微调(Adapter-tuning)变体有哪些?
- 四、提示学习(Prompting)篇
- 4.1 什么是提示学习(Prompting)?
- 4.2 提示学习(Prompting)的目的是什么?
- 4.3 提示学习(Prompting) 代表方法有哪些?
- 4.3.1 前缀微调(Prefix-tining)篇
- 4.3.1.1 什么是前缀微调(Prefix-tining)?
- 4.3.1.2 前缀微调(Prefix-tining)的核心是什么?
- 4.3.1.3 前缀微调(Prefix-tining)的技术细节有哪些?
- 4.3.1.4 前缀微调(Prefix-tining)的优点是什么?
- 4.3.1.5 前缀微调(Prefix-tining)的缺点是什么?
- 4.3.2 指示微调(Prompt-tuning)篇
- 4.3.2.1 什么是指示微调(Prompt-tuning)?
- 4.3.2.2 指示微调(Prompt-tuning)的核心思想?
- 4.3.2.3 指示微调(Prompt-tuning)的优点/贡献是什么?
- 4.3.2.4 指示微调(Prompt-tuning)的缺点是什么?
- 4.3.2.5 指示微调(Prompt-tuning)与Prefix-tuning 区别是什么?
- 4.3.2.6 指示微调(Prompt-tuning)与fine-tuning 区别是什么?
- 4.3.3 P-tuning 篇
- 4.3.3.1 P-tuning 动机是什么?
- 4.3.3.2 P-tuning 核心思想是什么?
- 4.3.3.3 P-tuning 做了哪些改进?
- 4.3.3.4 P-tuning 有哪些优点/贡献?
- 4.3.3.5 P-tuning 有哪些缺点?
- 4.3.4 P-tuning v2 篇
- 4.3.4.1 为什么需要P-tuning v2?
- 4.3.4.2 P-tuning v2 是什么?
- 4.3.4.3 P-tuning v2 有哪些优点?
- 4.3.4.4 P-tuning v2 有哪些缺点?
- 4.3.5 PPT 篇
- 4.3.5.1 为什么需要PPT ?
- 4.3.5.2 PPT 核心思想是什么?
- 4.3.5.3 PPT 具体做法是怎么样?
- 4.3.5.4 常用的soft prompt初始化方法?
- 4.3.5.5 PPT 的优点是什么?
- 4.3.5.6 PPT 的缺点是什么?
- 4.4 提示学习(Prompting) 优点是什么?
- 4.5 提示学习(Prompting) 本质是什么?
- 五、指令微调(Instruct-tuning)篇
- 5.1 为什么需要指令微调(Instruct-tuning)?
- 5.2 指令微调(Instruct-tuning)是什么?
- 5.3 指令微调(Instruct-tuning)的优点是什么?
- 5.4 指令微调(Instruct-tuning) vs 提升学习(Prompting)?
- 5.5 指令微调(Instruct-tuning) vs 提升学习(Prompting) vs Fine-tuning?
- 六、指令提示微调(Instruct Prompt tuning)篇
- 6.1 为什么需要指令微调(Instruct-tuning)?
- 6.2 指令微调(Instruct-tuning) 是什么?
- 6.3 指令微调(Instruct-tuning) 在不同任务上性能?
- 七、self-instruct篇
- 八、Chain-of-Thought 篇
- 8.1 为什么需要Chain-of-Thought ?
- 8.2 什么是Chain-of-Thought ?
- 8.3 Chain-of-Thought 的思路是怎么样的?
- 8.4 Chain-of-Thought 的优点是什么?
- 8.5 为什么chain-of-thought 会成功?
- 九、LoRA 篇
- 9.1 LoRA 篇
- 9.1.1 LoRA 核心思想是什么?
- 9.1.2 LoRA 具体思路是什么?
- 9.1.3 LoRA 优点是什么?
- 9.1.4 LoRA 缺点是什么?
- 9.2 AdaLoRA 篇
- 9.2.1 AdaLoRA 核心思想是什么?
- 9.2.2 AdaLoRA 实现思路是什么?
- 9.3 DyLoRA 篇
- 9.3.1 AdaLoRA 动机是什么?
- 9.3.2 AdaLoRA 核心思想是什么?
- 9.3.3 AdaLoRA 优点是什么?
- 十、BitFit 篇
- 10.1 AdaLoRA 核心思想是什么?
- 10.2 AdaLoRA 优点是什么?
- 10.3 AdaLoRA 缺点是什么?
点击查看答案
七、LLMs 面试篇
7.1 【现在达模型LLM,微调方式有哪些?各有什么优缺点?
- 现在达模型LLM,微调方式有哪些?各有什么优缺点?
点击查看答案
7.2 GLM:ChatGLM的基座模型常见面试题
- GLM 的核心是什么?
- GLM 的模型架构是什么?
- GLM 如何进行多任务训练?
- 在进行NLG 时, GLM 如何保证生成长度的未知性?
- GLM 的多任务微调方式有什么差异?
- GLM 的多任务微调方式有什么优点?
点击查看答案
一、基础算法常见面试篇
- 过拟合和欠拟合常见面试篇
- 一、过拟合和欠拟合是什么?
- 二、过拟合/高方差(overfiting / high variance)篇
- 2.1 过拟合是什么及检验方法?
- 2.2 导致过拟合的原因是什么?
- 2.3 过拟合的解决方法是什么?
- 三、欠拟合/高偏差(underfiting / high bias)篇
- 3.1 欠拟合是什么及检验方法?
- 3.2 导致欠拟合的原因是什么?
- 3.3 过拟合的解决方法是什么?
点击查看答案
- BatchNorm vs LayerNorm 常见面试篇
- 一、动机篇
- 1.1 独立同分布(independent and identically distributed)与白化
- 1.2 ( Internal Covariate Shift,ICS)
- 1.3 ICS问题带来的后果是什么?
- 二、Normalization 篇
- 2.1 Normalization 的通用框架与基本思想
- 三、Batch Normalization 篇
- 3.1 Batch Normalization(纵向规范化)是什么?
- 3.2 Batch Normalization(纵向规范化)存在什么问题?
- 3.3 Batch Normalization(纵向规范化)适用的场景是什么?
- 3.4 BatchNorm 存在什么问题?
- 四、Layer Normalization(横向规范化) 篇
- 4.1 Layer Normalization(横向规范化)是什么?
- 4.2 Layer Normalization(横向规范化)有什么用?
- 五、BN vs LN 篇
- 六、主流Normalization 方法为什么有效?
点击查看答案
激活函数常见面试篇
- 一、动机篇
- 二、激活函数介绍篇
- 2.1 sigmoid 函数篇
- 2.1.1 什么是sigmoid 函数?
- 2.1.2 为什么选sigmoid 函数作为激活函数?
- 2.1.3 sigmoid 函数有什么缺点?
- 2.2 tanh 函数篇
- 2.2.1 什么是tanh 函数?
- 2.2.2 为什么选tanh 函数作为激活函数?
- 2.2.3 tanh 函数有什么缺点?
- 2.3 relu 函数篇
- 2.3.1 什么是relu 函数?
- 2.3.2 为什么选relu 函数作为激活函数?
- 2.3.3 relu 函数有什么缺点?
- 三、激活函数选择篇
正则化常见面试篇
- 一、L0,L1,L2正则化篇
- 1.1 正则化是什么?
- 1.2 什么是L0 正则化?
- 1.3 什么是L1 (稀疏规则算子Lasso regularization)正则化?
- 1.4 什么是L2 正则化(岭回归Ridge Regression 或者权重衰减Weight Decay)正则化?
- 二、对比篇
- 2.1 什么是结构风险最小化?
- 2.2 从结构风险最小化的角度理解L1和L2正则化
- 2.3 L1 vs L2
- 三、dropout 篇
- 3.1 什么是dropout?
- 3.2 dropout 在训练和测试过程中如何操作?
- 3.3 dropout 如何防止过拟合?
点击查看答案
- 优化算法及函数常见面试篇
- 一、动机篇
- 1.1 为什么需要优化函数?
- 1.2 优化函数的基本框架是什么?
- 二、优化函数介绍篇
- 2.1 梯度下降法是什么?
- 2.2 随机梯度下降法是什么?
- 2.3 Momentum 是什么?
- 2.4 SGD with Nesterov Acceleration 是什么?
- 2.5 Adagrad 是什么?
- 2.6 RMSProp/AdaDelta 是什么?
- 2.7 Adam 是什么?
- 2.8 Nadam 是什么?
- 三、优化函数学霸笔记篇
点击查看答案
- 归一化常见面试篇
- 一、动机篇
- 二、介绍篇
- 2.1 归一化有哪些方法?
- 2.2 归一化各方法特点?
- 2.3 归一化的意义?
- 三、应用篇
- 3.1 哪些机器学习算法需要做归一化?
- 3.2 哪些机器学习算法不需要做归一化?
点击查看答案
- 判别式(discriminative)模型vs. 生成式(generative)模型常见面试篇
- 一、判别式模型篇
- 1.1 什么是判别式模型?
- 1.2 判别式模型是思路是什么?
- 1.3 判别式模型的优点是什么?
- 二、生成式模型篇
- 2.1 什么是生成式模型?
- 2.2 生成式模型是思路是什么?
- 2.3 生成式模型的优点是什么?
- 2.4 生成式模型的缺点是什么?
点击查看答案
二、机器学习算法篇常见面试篇
点击查看答案
- 支持向量机常见面试篇
- 一、原理篇
- 1.1 什么是SVM?
- 1.2 SVM怎么发展的?
- 1.3 SVM存在什么问题?
- 二、算法篇
- 2.1 什么是块算法?
- 2.2 什么是分解算法?
- 2.3 什么是序列最小优化算法?
- 2.4 什么是增量算法?
- 三、其他SVM篇
- 3.1 什么是最小二次支持向量机?
- 3.2 什么是模糊支持向量机?
- 3.3 什么是粒度支持向量机?
- 3.4 什么是多类训练算法?
- 3.5 什么是孪生支持向量机?
- 3.6 什么是排序支持向量机?
- 四、应用篇
- 4.1 模式识别
- 4.2 网页分类
- 4.3 系统建模与系统辨识
- 4.4 其他
- 五、对比篇
- 六、拓展篇
点击查看答案
- 集成学习常见面试篇
- 一、动机
- 二、集成学习介绍篇
- 2.1 介绍篇
- 2.1.1 集成学习的基本思想是什么?
- 2.1.2 集成学习为什么有效?
- 三、 Boosting 篇
- 3.1 用一句话概括Boosting?
- 3.2 Boosting 的特点是什么?
- 3.3 Boosting 的基本思想是什么?
- 3.4 Boosting 的特点是什么?
- 3.5 GBDT 是什么?
- 3.6 Xgboost 是什么?
- 四、Bagging 篇
- 4.1 用一句话概括Bagging?
- 4.2 Bagging 的特点是什么?
- 4.3 Bagging 的基本思想是什么?
- 4.4 Bagging 的基分类器如何选择?
- 4.5 Bagging 的优点是什么?
- 4.6 Bagging 的特点是什么?
- 4.7 随机森林是什么?
- 五、 Stacking 篇
- 5.1 用一句话概括Stacking ?
- 5.2 Stacking 的特点是什么?
- 5.3 Stacking 的基本思路是什么?
- 六、常见问题篇
- 6.1 为什么使用决策树作为基学习器?
- 6.2 为什么不稳定的学习器更适合作为基学习器?
- 6.3 哪些模型适合作为基学习器?
- 6.4 Bagging 方法中能使用线性分类器作为基学习器吗? Boosting 呢?
- 6.5 Boosting/Bagging 与偏差/方差的关系?
- 七、对比篇
点击查看答案
九、【关于Python 】那些你不知道的事
- 【关于Python 】那些你不知道的事
- 一、什么是*args 和**kwargs?
- 1.1 为什么会有*args 和**kwargs?
- 1.2 *args 和**kwargs 的用途是什么?
- 1.3 *args 是什么?
- 1.4 **kwargs是什么?
- 1.5 *args 与**kwargs 的区别是什么?
- 二、什么是装饰器?
- 三、Python垃圾回收(GC)
- 3.1 垃圾回收算法有哪些?
- 3.2 引用计数(主要)是什么?
- 3.3 标记-清除是什么?
- 3.4 分代回收是什么?
- 四、python的sorted函数对字典按key排序和按value排序
- 4.1 python 的sorted函数是什么?
- 4.2 python 的sorted函数举例说明?
- 五、直接赋值、浅拷贝和深度拷贝
- 5.1 概念介绍
- 5.2 介绍
- 5.3 变量定义流程
- 5.3 赋值
- 5.4 浅拷贝
- 5.5 深度拷贝
- 5.6 核心:不可变对象类型and 可变对象类型
- 5.6.1 不可变对象类型
- 5.6.2 可变对象类型
- 六、进程、线程、协程
- 6.1 进程
- 6.1.1 什么是进程?
- 6.1.2 进程间如何通信?
- 6.2 线程
- 6.2.1 什么是线程?
- 6.2.2 线程间如何通信?
- 6.3 进程vs 线程
- 6.4 协程
- 6.4.1 什么是协程?
- 6.4.2 协程的优点?
- 七、全局解释器锁
- 7.1 什么是全局解释器锁?
- 7.2 GIL有什么作用?
- 7.3 GIL有什么影响?
- 7.4 如何避免GIL带来的影响?
十、【关于Tensorflow 】那些你不知道的事
- 【关于Tensorflow 损失函数】 那些你不知道的事
- 一、动机
- 二、什么是损失函数?
- 三、目标函数、损失函数、代价函数之间的关系与区别?
- 四、损失函数的类别
- 4.1 回归模型的损失函数
- (1)L1正则损失函数(即绝对值损失函数)
- (2)L2正则损失函数(即欧拉损失函数)
- (3)均方误差(MSE, mean squared error)
- (4)Pseudo-Huber 损失函数
- 4.2 分类模型的损失函数
- (1)Hinge损失函数
- (2)两类交叉熵(Cross-entropy)损失函数
- (3)Sigmoid交叉熵损失函数
- (4)加权交叉熵损失函数
- (5)Softmax交叉熵损失函数
- (6) SparseCategoricalCrossentropy vs sparse_categorical_crossentropy
- 5. Summary