This project covers 6 major theme modules , computer vision and perception algorithm topics , deep learning basic and framework topics , autonomous driving, smart medical and other industry vertical topics , hand-teared project code topics , and excellent open source resource recommendation topics . We will continue to organize and summarize the latest interview questions and analyze these questions in detail. In addition to the interview scenarios, our questions also come from thinking about the latest academic paper innovations. We hope to become an effective auxiliary material on our academic research, work innovation, and offer interviews.
The 2024 algorithm interview questions continue to be updated. For details, please follow the 2024 deep learning algorithm and big model interview guide. If you like this project, please click on the star in the upper right corner. You are also welcome to create the project together.
The project continues to be updated:

| 01. Model fine-tuning: How is the principles of LORA and Ptuning commonly used fine-tuning methods for large models different from traditional fine-tuning fine-tuning? |
|---|
| 30. Model fine-tuning: The difference between Instruction Tuning and Prompt Tuning methods? |
| 07. Model fine-tuning: Reasons for the decline in LLM performance after supervising fine-tuning of SFT |
| 18. Model fine-tuning: How to train LORA for large model fine-tuning? |
| 19. Model fine-tuning: How to initialize LORA's matrix? Why initialize to all 0? |
| 33. Model fine-tuning: When performing SFT operations, should Chat or Base be used for base model? |
| 03. Model structure: Why are most of the current big models with Decoder only structures |
| 15. Model structure: Can you summarize the training process of ChatGPT? |
| 16. Model structure: What are the markers in the context of a large language model (llms)? |
| 40. Model structure: What is the difference between GPT3 and LLAMA's Layer Normalization? |
| 04. Model optimization: How to alleviate the problem of LLMs repeater |
| 14. Model optimization: What are the strategies to reduce hallucinations in large language models (llms)? |
| 29. Model optimization: How to improve the Prompt generalization of large language models? |
| 34. Model optimization: During the pre-training process of open source large models, books, papers and other data will be added. How to organize and process this part of the data? |
| 38. Model optimization: How to solve the catastrophic forgetting problem of chatglm fine-tuning? |
| 10. What are the advantages of BERT for classification tasks, what are the subsequent improvement work? |
| 23. What are the pre-training tasks for BERT? Why introduce the next sentence prediction task? |
| 37. Are position coding and attention mechanisms used during the pre-training process of BERT? |
| 38. LangChain is usually used as an "adhesive" to connect the various modules needed to build LLM applications together. Please introduce its core modules? |
| 39. Model optimization: In order to improve the inference efficiency of Llama 3, the model structure adopts grouped query attention (GQA) to briefly describe this module? |
| 40. Model architecture: What is the attention mechanism used in llama2? |
| 41. Model architecture: Have you learned about Loss of several mainstream big models in the pre-training stage? What are the similarities and differences? |
| 42. Model architecture: What are the characteristics and application scenarios of rotational position coding (RoPE) and ALiBi position coding? |
| 43. Model architecture: What three components does the overall network architecture of the Qwen-VL model include? Please introduce their functions and sources separately? |
| 44. Model architecture: How are images processed for input to Qwen-VL model? What kind of feature sequence do they get after passing through the visual encoder and adapter? |
| 45. Data preparation: fine-tune the format of the training set of a large language model? How to handle training data generated by GPT? |
| 46. Model fine-tuning: What are the limitations of supervised fine-tuning (SFT) compared to RLHF? If SFT data is cleaned and manufactured with RM, can it replace rlhf? |
| 47. Data preparation: What algorithms are used for data reuse when processing dialogue and corpus data, and what are the data enhancements done for corpus training stage? |
| 48. Data preparation: LLaMa3.1 has been fine-tuned for several rounds. What are the reward model training data and SFT training data? |
| 49. Model reasoning: How to alleviate the generalized and narrow-sense illusions that appear in large models under the existing technical paradigm? |
| 50. Model training: What are the advantages of the distributed training framework DeepSpeed compared to Pytorch native torchrun? |
| 51. Model reasoning: When LLM reasoning, multiple data are parallel in the Prefill stage, which is a computing bottleneck. What are the corresponding acceleration methods? |
| 52. Model reasoning: When LLM reasoning, the Decode stage iterates one token at a time, and the memory consumes more time. What are the corresponding acceleration methods? |
| 53. Model optimization: From an architectural point of view, LLM mainly optimizes Attention and FFN. What are the Attention optimizations? |
| 54. Model reasoning: What is the memory usage of large model training and fine-tuning? |
| 55. Model training: Where is the time spent in the training phase of the big model? For example, it involves training with kilocal calories. |
| 02. In the visual model, what are the key innovations in DINOv2’s architectural design? |
|---|
| 01. How to use text to control generation in Stable Diffusion? |
| 21. What are the main problems that Diffusion solves compared to Stable Diffusion? |
| 22. Choose a random time step for each round of training samples in Stable Diffusion? |
| 39. What is the training and prediction process of Stable Diffusion like? |
| 11. Base model: SAM splits all Promot types in the network and how to input them into the network |
| 26. Base model: Training general object detectors often use multi-source images for training. How to deal with new category discrimination? |
| 27. Base model: Grounding DINO can detect any target based on text prompts and briefly describe the basic architecture of the network? |
| 28. Base model: How to perform zero sample migration in Grounding DINO, such as detecting capacitor resistance in the circuit board? |
| 29. Base model: Several ideas for lightweight SAM networks and representative work? |
| 30. Stable Diffusion XL is a two-stage cascading diffusion model that briefly describes its workflow? |
| 31. Attention mechanism is used to the semantic information of text and image, while Text Condition is three-dimensional, while Latent Feature is four-dimensional? |
| 32. Give examples to introduce the entire text encoding process of the SDXL model? |
| 33. In the classic failure cases of SD 1.4 and SD 1.5, the essential reasons and optimization solutions for the problem of head missing in the cat in the generated image? |
| 34. DINOv2 creates a new high-quality data set, in which deduplication and retrieval are used in the processing process. Briefly describe its steps? |
| 35. Briefly describe the objective functions of Image-level and Patch-level in DINOv2 training? |
| 36. What are the hidden vectors corresponding to the unmask and mask parts in the decoder of the visual pre-trained MAE model? |
| 37. Model problem: Multimodal large models often use MLP as vision mappers to map visual features tokens to text space one-to-one. How to compress the amount of visual tokens to improve efficiency? |
| 38. Model question: How many ways do high-resolution images in VLM models reduce the number of tokens? |
| 01. Why are LayerNorm used instead of BatchNorm in Transformer? |
|---|
| 06. Why does Transformer use the multi-head attention mechanism |
| 32. Attention calculation complexity in Transformer and how to improve it? |
| 12. How does Transformer's layer fusion be achieved, and how does Residue Network and Layer Norm operate fusion |
| 41. What is the difference between MHA bull attention and MQA multi-query attention? |
| 17. What is the use of Adaptive Softmax in large language models? |
| 31. Knowledge distillation is a method to transfer knowledge from complex models to simple models. What are the improvements for knowledge distillation? |
| 42. What is the role of Flash Attention, the reasoning optimization technology? |
| 43. ZeRO, three stages of zero redundancy optimizer? |
| 44. What changes did Mamba make to RNNs so that it can be calculated faster on the GPU? |
| 45. The multi-head attention mechanism MHA is the core component in the Transformer model, and the core idea of KV Cache and GQA optimization? |
| 46. How does BPE (Byte Pair Encoding) and Tokenization affect the model performance and training process? |
| 47. What are the reasons and solutions for the loss spike in pre-training of large models above 100B? |
| 01. Give examples to illustrate how reinforcement learning plays a role? |
|---|
| 28. How to understand the maximization of rewards in reinforcement learning? |
| 24. After field data training, general ability often decreases. How to alleviate the model's forgetting general ability? |
| 25. How to deal with the alignment of data modes in large language models (llms)? |
| 35. Can you provide some examples of alignment problems in large language models? |
| 01. Large convolution kernel: Can larger kernels achieve higher accuracy in CNN networks? |
|---|
| 02. Optimization algorithm: Hungarian matching method can be used in issues such as positive and negative sample definition, and introduce its implementation principle. |
| 03. Loss function: How to adjust the parameters of Focal loss and what problems exist |
| 04. Model lightweight: Give examples some representative lightweight models that optimize from parameter quantity, floating point operation quantity, and model inference delay? |
| 05. Image processing: Defects in ORB feature extraction and how to improve it |
| 06. General module: Why is FPN feature fusion an addition operation? |
| 07. General module: How to understand the two common feature map features fusion methods, concat and add? |
| 08. General module: Transformer's attention mechanism often uses softmax function. Can sigmoid be used instead? |
| 09. General Module: What are some basic principles when designing lightweight models? Which one is more time-consuming than concat or add? |
| 10. General module: Lightweight CNN networks often use deep separable convolutions. How to calculate the point convolutions of FLOPs and MAC? |
| 11. Loss function: Focal loss supports discrete category labels such as 0/1. What should be done if the label is a continuous value of 0~1? |
| 12. Loss function: Focal loss pays too much attention to difficult-to-divided samples, so it will be affected by outliers. How to attenuate both easy-to-divided samples and particularly difficult-to-divided samples at the same time? |
| 13. General module: The difference between Dropout training and inference. During the training phase, the neuron output of a certain layer is randomly set to zero with the probability of p. How to deal with it during inference? |
| 01. Loss function: Why is ArcFace better than CosFace in facial recognition task |
|---|
| 02. General Module: Introducing CBAM Attention |
| 03. General Module: How to achieve local attention |
| 04. Data Enhancement: Introduction to mixup and its variants |
| 05. Scenario Problems: Common Solutions to Long Tail Problems in Visual Tasks |
| 06. Scenario problem: What to do if several categories overlap (small differences between classes) in the classification task, and how to design the network structure |
| 07. Scenario problem: How to achieve good results in scenario B when marking and training the target in scenario A? |
| 08. Scenario problem: How to better train a binary classification task, in which 80% of the data are marked correctly and 20% fail |
| 09. Base Model: An introduction to the core innovations of CLIP, how it handles text input |
| 10. Base model: How do ViT and DEIT handle variable-length sequence input? |
| 11. Base model: How does the processing of input images in VIT change patch to token? |
| 01. Sample matching strategy: How to solve the problem of GT inconsistency caused by overlapping samples in FCOS training stage |
|---|
| 02. Sample matching strategy: Why can Centernet remove NMS, and the definition of positive and negative samples |
| 03. Sample matching strategy: Yolov5's positive and negative sample definition, whether a target will be assigned to a different FPN layer |
| 04. Sample matching strategy: Positive and negative sample definition of Yolov7 |
| 05. Sample matching strategy: Positive and negative sample definition of Yolov8 |
| 06. Sample matching strategy: Positive and negative sample definition of Yolov9 |
| 07. Sample matching strategy: Positive and negative sample definition of Yolov1 |
| 08. Sample matching strategy: DETR uses binary graph matching to implement label assignment, briefly describe its process |
| 09. Sample matching strategy: How to solve the problem of the location of multiple target center points close to each other |
| 10. Sample matching strategy: How to remove dependence on anchor when the Anchor-Based detector is in the positive and negative sample label allocation stage? |
| 11. Sample matching strategy: How to select positive and negative samples for object detection will greatly affect the final detection effect. For example, how to deal with ATSS? |
| 12. Loss function optimization: the role of centerness in FCOS loss function |
| 12. Sample matching strategy: FCOS builds positive and negative sample stages. What should I do if the overlap between large and small scales, such as the Apple in the human and the hands? |
| 12. Loss function optimization: FCOS uses the area-based method to solve the ambiguity problem of positive sample allocation, which is not very friendly to large goals? Is there any better solution? |
| 13. Loss function optimization: What are the methods that can solve the problem of positive and negative samples imbalance in object detection |
| 14. Details: What is the difference between Yolov5 and Yolov4? |
| 15. Details: What is the difference between the Foucs layer and the Passthrough layer of Yolov5 |
| 16. Details: The role of objectness in Yolov5, how to get the probability score of the final output |
| 17. Model problem: Introduction to the process of serialized data from image input to Encoder processing in DETR. |
| 18. Decoding question: Explain the meaning of YOLOv5 model output (1, 25200, 85) and the decoding process? |
| 19. Decoding problem: Explain the meaning of the three headers of Centernet model output offset/scale/heatmap, and the decoding process? |
| 20. Scenario problem: How to calculate the rotation box IOU in object detection |
| 21. Scenario problem: How to modify Yolov5 object detection to achieve rotation object detection? |
| 22. Scenario problem: In the case of target Crowded, is there often a false detection between two real targets? |
| 23. Scenario problem: Can setting more prior anchors improve the performance of small targets and abnormal size targets, and what other problems exist besides the calculation speed |
| 24. Scenario problem: At present, detection often requires NMS non-maximum threshold algorithm as post-processing. Is there any solution to avoid NMS post-processing? |
| 25. Model question: How to understand the concept of object query in DETR, and to provide a better position prior to cross attention, how to design the model? |
| 26. Model question: What are the head output channels of YOLOV5 and YOLOV8 respectively? Suppose it is now a detection task of 2 categories? |
| 01. Model question: In the Unet network structure, is four downsampling necessary for segmenting the network? |
|---|
| 02. Model question: Why can UNet++ be pruned? How to decide how much to cut? |
| 03. Model question: How to deal with the target's segmentation mask output when segmenting all network SAM? |
| 04. Model problem: SAM's local model inference effect is significantly worse than that of the online web version. Is there any way to optimize its effect? |
| 05. Base model: What problems does VIT directly use for prediction-intensive tasks such as segmentation detection? |
| 06. Model problem: The difference between upsampling feature maps using decoding using decoder/empty convolution/bilinear interpolation in the decoder? |
| 07. Model question: The combination of maximum pooling and downsampling commonly used in segmentation network encoding parts realizes invariance, but has a certain impact on positioning accuracy. Combined with a fully connected conditional random field (CRF) to achieve positioning optimization? |
| 08. Model problem: The propt_encoder part in SAM supports several types of inputs. How to encode for point prompts? |
| 08. Model question: The difference between matting and traditional segmentation, introduce the principle of matting? |
| 01. Monologic 3D: How to define positive and negative samples in FCOS3D training stage |
|---|
| 02. Monologic 3D: Briefly describe the structure of the head part of FCOS3D and the reference point definition for predicting the 2.5D center point offset? |
| 03. Monocular 3D: Briefly describe the decoding process of FCOS3D and how to obtain the three-dimensional target box on the 2D image? |
| 04. Monocular 3D: FCOS3D and most monocular 3D estimate depth based on isolated instances or pixels, while ignoring the geometric relationship between different objects. What strategies do you have to improve? |
| 05. Point Cloud 3D: The process of PointPillars converting point clouds into sparse pseudo-images, where are the steps of Scatter operations detailed? |
| 06. BEV: Several ways to transform PV2BEV perspective angles. What parameters do model-based methods require at least in addition to the internal and external parameters of the camera? |
| 01. Adversarial network: Identification and resolution of pattern collapse in GAN? |
|---|
| 02. Depth estimation: Briefly describe the photometric reconstruction losses commonly used in depth estimation tasks? |
| 01. Pytorch often combines multiple data sets during training. What exactly does ConcatDataset do? |
|---|
| 02. How to deal with Pytorch’s multi-card BN? |
| 03. What are the main parameters of Pytorch DataLoader |
| 04. How to avoid .to(device) operations in Pytorch code? |
| 05. Application scenarios for nn.Identity()/.chunk/.masked_select/.gather operation in Pytorch? |
| 06. Common strategies for saving video memory in PyTorch |
| 07. Some attribute problems with PyTorch's Modules |
| 08. The difference and usage scenarios between ModuleList and Sequential in PyTorch |
| 09. Use scenarios and usage of ConcatDataset in PyTorch |
| 10. The difference between nn.Upsample and interpolate in PyTorch |
| 11. The difference between dataset and dataloder in PyTorch. What operations are required to customize dataset? |
| 12. The main and commonly used normalization operations in PyTorch include BN, LN, IN, and GN. Let me introduce their differences? |
| 13. What is the difference between nn.Linear() and nn.Embedding() in PyTorch? |
| 14. Dataset in PyTorch is the basic class used to represent data sets. What functions do you need to rewrite to create a custom dataset? |
| 01. Why can TensorRT make models run faster |
|---|
| 02. Some features of MMengine, what are the basic configurations of its |
| 03. Add a custom backbone network to MMdetect. What codes do you need to change? |
| 04. Introduction to the Hook mechanism in MMCV and creating a new Hook |
| 05. Pytorch Lighting’s design philosophy and what you think is easy to use |
| 06. MMdetect has the characteristics of flexibility and convenience when building model structures. For example, the optional parameters of ResNet style allow pytorch and caffe. What is the difference between the two? |
| 07. Briefly describe the two ways of Box Assigner allocator in MMdetection? |
| 08. Briefly describe the types of positive/negative sample samplers in MMdetection, such as RandomSampler? |
| 09. How to set input_names, output_names, dynamic_axes in torch.onnx.export()? |
| 10. How do you differently do you use torch.onnx.is_in_onnx_export() to make the model behave when converting to ONNX? |
| 11. Large model training generally uses torch2.0 or above, where torch.compile can accelerate training. Let me introduce how to use it and whether it works for ordinary python code? |
| 12. Briefly describe what you think are the advantages and disadvantages of MMCV |
| 13. Training problem: The parameter settings in multi-machine and multi-card training take 2 machines and 8 cards as an example: What are the distributed training output rank/local_rank/world_size? |
| 14. Training question: What are the implementation methods for distributed training data sharding? |
| 15. Training problem: How to solve the problem that memory continues to increase during Pytorch training? |
| 01. Operator problem: How to fusion of convolution and BN to improve inference speed |
|---|
| 02. Operator problem: The reason for the decrease in the effectiveness after the neural network introduces attention mechanism |
| 03. Operator problem: comparison and advantages and disadvantages of activation functions |
| 04. Operator problem: Comparison of time complexity of Transformer/CNN/RNN |
| 05. Operator problem: Depth separable convolution |
| 06. Operator problem: The difference between CNN and MLP |
| 06. Operator problem: How to operate max pooling? In which scenario average is more suitable than max pooling |
| 07. Loss function: Application of loss function - hinge loss |
| 08. Loss function: Why cross entropy can be used as a loss function |
| 09. Optimization algorithm: Similarities and differences between optimization algorithms SGD/AdaGrad/Adam |
| 10. Optimization algorithm: What are the methods of initializing weights? |
| 11. Optimization algorithm: Why not regularize bias in deep learning? |
| 12. Optimization algorithm: Why can regularization increase model generalization capabilities |
| 13. Optimization algorithm: Why does Adam often fail to beat SGD? What are the key points and improvement plans? |
| 14. FAQ: How to distinguish between error samples and difficult samples in deep learning training |
| 15. FAQ: The role of Warmup preheating learning during deep learning model training |
| 16. FAQ: Consider a filter[-1 -1 -1; 0 0 0; 1 1 1] for convolution, which edges will be extracted from the input image |
| 17. Scenario problem: How to incorporate traditional image processing features into deep learning models? What are the problems with direct splicing and fusion? |
| 18. Scenario problem: How should the weight of each task loss in multitasking learning be designed? |
| 19. Scenario problem: How to deal with unbalanced datasets? |
| 20. Scenario question: How to effectively cut a big model into several sub-models? How to assign the cut submodel to multiple nodes for parallel training? |
| 21. Optimization question: Why can't neural network weights be initialized to 0, while logistic regression parameters can be initialized to 0? |
| 22. Frequently asked questions: When the Batch Size increases, how should the learning rate change accordingly, and how should it change specifically? |
| 01. What are the meanings of camera internal and external parameters? If the image is enlarged twice, how do the internal and external parameters change? |
|---|
| 02. What is the transformation relationship from the world coordinate system to the image coordinate system? |
| 03. What are the radiation transformation and inverse projection transformation? |
| 04. How to adjust Q and R when Kalman filtering |
| 05. How to understand BEV space and generate BEV features |
| 08. Why don’t you use online learning for railing detection? |
| 09. How to use the same filter to adapt to the scene where the vehicle crosses at the same time |
| 10. How to enhance BEV features |
| 11. In assisted driving scenarios, the model predicts the bbox coordinates of medium and large targets within 60 meters, and has a large jitter problem, resulting in unstable distance measurement. How to solve it? |
| 12. In assisted driving scenarios, how to solve the problem of mischecking specific backgrounds such as bus stops and houses? |
| 13. In assisted driving scenarios, how to solve the problem of jumping in vehicle models with a classification of more than 100m? |
| 16. Explain the meaning of the noise matrix in KF. Is it estimated that the noise will become larger or smaller in the equation of motion? Is it estimated that the noise will become larger or smaller in the correction equation? |
| 20. The task of lane line detection usually adopts a segmentation plan. How to downgrade the plan to detection, or even lane line classification? |
| 21. How to deal with different routes in lane line detection tasks, such as road intersections? |
| 24. Briefly describe the Decoder logic of BEVformer? |
| 25. Steps of Spatial Cross-Attention in BEVFormer? |
| 26. How to project multiple camera images on the car into a 2D plane? |
| 27. If your car has 4 lidars, how do you design a point cloud segmentation algorithm? |
| 28. If you are currently required to divide the bricks in the scene, can point cloud segmentation be correctly identified? |
| 29. How to remove the water mist in the point cloud? |
| 30. What kind of prior knowledge is used for vehicle width ranging and ground point ranging? If these priors are not valid, what means can be used to relax restrictions? |
| 31. Three methods of estimating the angle of the Pitch during vehicle driving? |
| 32. How to eliminate corner points in a bunch of 3D point clouds? |
| 33. How to convert 3D world coordinate points to 2D image coordinates? |
| 34. What are the prediction information for monocular 3D object detection? How to deal with the truncated target when predicting the center deviation of the 3D box? |
| 35. In the process of estimating depth through geometric relationships, the uncertainty of depth estimation is high due to the height error. How to alleviate it? |
| 36. What are the camera sensor configuration and labeling content in the Nuscenes dataset? |
| 37. Briefly describe the transformation of tensor flow during feature extraction of BEVformer model? |
| 38. Briefly describe several ways to generate BEV feature maps. What exactly does the lift operation of LSS do? |
| 39. The perception algorithm hopes to obtain both high-resolution input images and large FOV input images. The general approach in the industry is to set up the ROI area. How to choose? |
| 40. Now we need to develop a visual language model to solve the Corner Case autonomous driving task facing general perception. How to do it? |
| 01. Data annotation: How to solve the problem of inconsistency in labeling of multiple people due to the professional differences in labeling of medical images? How to reduce errors using algorithms? |
|---|
| 02. Model question: How to add medical history information to the model to enhance the final classification effect? |
| 03. Model problem: There is a hard edge problem in segmentation. For example, in retinal vascular segmentation, how to optimize the effect of edge segmentation? ? |
| 04. Model question: Multiple objects stacking will form a potential target partially blocked, and the integrity of the target is the basis for further judgment. How to complete the segmented boundary of the occluded target? |
| 05. Model question: The object detection algorithm based on digital pathological slices will be affected by scanning imaging devices, such as defocusing blur, motion blur, etc. during scanning. What are some feasible optimization solutions? |
| 06. Model question: How to add prior knowledge to the model, and what are the methods? |
| 01. Natural language processing: Given the current query, historical query and corresponding entity in NLP, how to model the entity of the current query? |
|---|
| 02. Machine Learning: Bank managers receive a dataset containing records of thousands of applicants applying for loans. How does an AI algorithm help a manager understand what loans he can approve? |
| 03. Image recognition: Deploying face recognition algorithms in practical applications requires the ability to learn new face identities from continuous data streams. How to do incremental learning in a class? |
| 01. How to train a model on annotating data with errors? |
|---|
| 02. What is the specific difference between object detection in video and image |
| 03. Give several optical flow methods to illustrate the modeling method of LK optical flow? |
| 04. How to choose a suitable feature combination when the data volume is very limited but the number of features is extremely large? |
| 05. SAM's point prompt and box prompt enter dimensions. Does box prompt support multiple boxes? |
| 06. Why does the impact of larger batch size on comparison learning have greater impact on supervised learning? |
| 07. Provide a picture dataset, part of which is noise picture, that is, the label is wrong. How to train the model to achieve the best effect? |
| 08. Now we need to predict the rotation angle of the target on the image, such as the rotation angle of the vehicle target in a remote sensing image scenario. How should you elegantly code and achieve better prediction? |
| 09. Fisheye cameras have a larger field of view, so they are often used in some monitoring scenarios. How to perform detection or segmentation tasks under fisheye images? |
| 10. How to deal with day-night cross-domain vehicle re-identification, that is, identify the same vehicle in different domains, including daytime and nighttime domains? |
| 11. If a dataset has almost no cat object in a certain area of the image, will the object detector be not good at detecting cats in this area? |
| 01. Pytorch implements attention mechanism and multi-head attention |
|---|
| 02. Pytorch builds a basic module, including Conv+Bn+Relu |
| 03. Pytorch builds a CNN convolutional neural network |
| 04. Fusion of PyTorch convolution and BatchNorm |
| 05. Code implementation of PyTorch segmentation network loss function Dice Loss |
| 06. PyTorch implements Focalloss |
| 07. Pytorch implements BN batch normalization |
| 08. Pytorch needs to truncate the value of the input of L1 loss and build the CustomL1Loss class |
| 08. Pytorch implements SGD optimization algorithm |
| 08. Pytorch implements Triplet Loss |
| 09. Numpy broadcasting mechanism implements calculation of L2 distance between matrices |
| 10. Numpy input boxA and boxB implement the calculation of bbox_iu |
| 11. Numpy inputs two sets of coordinates to implement IOU calculation |
| 12. Numpy implements Focalloss |
| 13. Numpy implements non-maximum suppression of NMS |
| 14. Numpy implements improved version of non-maximum suppression Soft-nms |
| 15. Numpy implements a function to calculate the cosine similarity between two vectors |
| 16. Numpy implements Sigmoid function |
| 17. Numpy implements Softmax function |
| 18. Numpy implements K-means clustering |
| 18. Numpy completes the class of sparse matrix and implements the operations of add and multiply |
| 19. C++ describes the process of image resize and implements it |
| 20. Implementation of C++ Conv2D Convolution |
| 21. Numpy implements the least squares method of linear regression loss function, inputs coordinate points corresponding to the straight line, and outputs loss |
| 22. Numpy implements linear regression, input learning rate, iteration number and coordinate points |
| 23. Numpy realizes one-hot encoding of the target real number category |
| 24. Numpy implements cross entropy loss function |
| 25. Pytorch implements image normalization operations |
| 26. Numpy implements maxpooling operation |
| 27. Pytorch uses torch.utils.data.Dataset class to build a custom dataset and create an image-classified dataset based on the file name suffix. |
| 28. Python implements inverse perspective transformation IPM (bird's eye view) |
| 29. Numpy implements multiplication of two matrices and verify whether the result is the same as torch.matmul in PyTorch |
| 30. PyTorch builds a custom layer that implements a simple LReLU activation function |
| 31. PyTorch writes a data enhancement class to implement random horizontal flips and channel transformations |
| 32. PyTorch implements the image-to-patch Embedding process, prompting that convolution can be implemented |
| 01. Recommended multiple excellent data structures and algorithm projects |
|---|
| 02. Summary of interviews for big model jobs: 24 companies in total, 9 offers |
| 03. Visual detection divides all source codes and online demos |
| 04. Learn deep learning Pytorch hands-on |
| 05. A tool for saving, searching, accessing, exploring and chatting with all your favorite websites, documents and files |
| 06. Collect some free ChatGPT mirroring sites |
| 07. Everything about Large Language Models (LLM) |
| 08. Deep Learning Tuning Guide Chinese Version |
| 09. A collection of latest papers and data sets for multimodal large language models |
| 10. ChatPaper: Tool for ChatGPT to accelerate scientific research processes |
| 11. Fine-tuning of LLaMA on consumer hardware |
| 12. A series of generative models provided by Stability AI |
| 13. A framework for learning powerful visual features by self-supervised DINOv2 |
| 14. FastSAM FastSAM |
| 15. Language model interview questions |
| 16. Awesome Chinese LLM organizes open source Chinese language models |
| 17. Technology enthusiasts weekly, gathering excellent open source projects and publishing them every Friday |
| 18. Open source project for large-scale displays line estimation |
| 19. Several ways to read Medium articles for free in 2024 |