Object tracking technology covers theoretical knowledge in many fields such as image processing, photoelectric technology, pattern recognition, and machine learning, and plays an important role in transportation, security, medical, military and other fields. It has important research and application value to improve the moving target tracking algorithm in complex scenes. This thesis reviews the development of target tracking algorithms in recent years and analyzes the principles of related tracking algorithms. On this basis, two robust online target tracking algorithms are proposed in view of the complex challenges in the current tracking field. The work is summarized as follows: (1) A long-term object tracking method, which is based on background constraints and convolutional features, is proposed to solve the target loss problem caused by background aliasing and occlusion in long-term object tracking. Firstly, the feature of input image is fused and dimensionally reduced to enhance the performance of target feature discrimination and reduce the complexity of feature computation. Secondly, background constraints are introduced into the filter training process, which makes the filter more focused on the target response to improve the anti-jamming ability. Finally, by setting memory filter and the Peak to Sidelobe Ratio detection, the tracker can judge whether the target is missing or not. If the target is lost, a convolutional features filter is introduced to re-detect the target. The public data set experiment proves that the algorithm performs well under the background of aliasing with occlusion and achieves long-term stable tracking. (2) Aiming at the problems of low feature learning efficiency and imbalance of positive and negative samples in the target tracking algorithm based on the siamese network structure, a target tracking algorithm based on the dual self-attention mechanism siamese network is proposed. Firstly, the channel self-attention and spatial self-attention modules are introduced into the backbone network of the siamese structure to refine the network features. Secondly, the deconvolution strategy is used to concatenate the depth features of different levels, and make full use of the representation capabilities of each stage of the convolutional network. Finally, the method of online-hard-example-mining is introduced to alleviate the imbalance between positive and negative samples. At the same time, it focuses on the training of hard samples to improve the network's discriminating ability under the scenes of background clutter. The comparative experiments under the challenges of multiple complex scenes such as background clutter, fast motion, rotation, and deformation prove the effectiveness of the proposed algorithm.