freeonlinepoker| Media Industry Review Report: GPT-4O Multimodal Capabilities Jump Again in AI Applications or Accelerate Implementation

2024-05-15 0 Comments

OpenAI released GPT-4o, which is greatly improved in multimodal capabilities and is free to users. On May 14, OpenAI released a new generation of flagship generation model GPT-4o. GPT-4o is a new large model for the future human-computer interaction paradigm, which has the understanding of text, voice and image. It responds very quickly with emotion and understands human nature. GPT-4o has several main characteristicsFreeonlinepoker(1) the multimodal capability has been greatly improved, allowing real-time reasoning across text, audio and video: the performance of GPT-4o on English text and code matches that of GPT-4 Turbo, but the performance on non-English text is significantly improved. Compared with existing models, GPT-4o is particularly good at visual and audio understanding, and GPT-4o can also perform real-time reasoning across text, audio and video. Users can upload all kinds of pictures, videos, as well as documents containing pictures and text, and discuss the contents, making human-computer interaction more natural. (2) more "human-like": GPT-4o can speak in a natural voice that sounds like a human, and can also make emotional analysis through audio and image perception. (3) millisecond response, the cost of API is lower: before the release of GPT-4o, the average delay of talking to ChatGPT through voice mode is 2.Freeonlinepoker.8 seconds (GPT-3Freeonlinepoker.5) and 5.4s (GPT-4), while the average response time of GPT-4o is 320ms. At the same time, the speed of API is faster and the cost is reduced by 50%.Freeonlinepoker(4) have the production capacity of 3D visual content: be able to perform 3D reconstruction from 6 generated images. In addition, GPT-4o will be available to all users free of charge, and OpenAI will launch a desktop version of ChatGPT, allowing a lightweight experience that can be integrated into any workflow. Competition for large models at home and abroad intensifies, continue to improve performance and reduce use costs, continue to layout AI application Google's Imax O developers Conference will be held at 1: 00 a.m. on May 15, Beijing time, when important updates to the Gemini model may be released. On the domestic side, on May 7, the AI company of Magic Square Quantification (DeepSeek) released the new second-generation MoE model DeepSeek-V2. DeepSeek-V2 has 236 billion parameters, and its Chinese comprehensive ability (AlignBench) is higher than that of GPT-4, and it is on the same echelon as GPT-4-Turbo and Wenxin 4.0. its English comprehensive ability (MT-Bench) is on the same echelon as the strongest open source model LLaMA3-70B, surpassing the strongest MoE open source model Mixtral8x22B, while the amount of computation required for DeepSeek-V2 training may be 20% of that of GPT-4, but its performance is not much different. At present, the API price is only "1 yuan per million token input and 2 yuan output (32K context)". We believe that OpenAI's launch of GPT-4o and the recent frequent release of large model iterative upgrade results by manufacturers at home and abroad, and the focus of competition may focus on multimodal capability, Agent capability and API call cost optimization, are the key factors for the landing and commercialization of large model applications, or help AI applications in film and television, music, education, marketing, search, office and other fields "more useful and more cost-effective". In order to promote the commercial space of AI applications, it is suggested to continue the layout of AI applications: (1) AI Film and Television: focus on the recommendation of Shanghai Film and China Literature Group, the beneficiaries include Light Media, Huatze Film and Television, Jebsen shares, Zhongguang Natural selection and so on. (2) AI Music: Shengtian Network and Cloud Music are recommended. The beneficiaries include Kunlun Wanwei, Tencent Music and so on. (3) AI education: the beneficiaries include Century Tianhong, Southern Media, Shengtong shares and so on. (4) AI marketing: the beneficiaries include Gravitation Media, Insai Group, Blue cursor and so on. (5) AI Agent: Alpha Animation & Culture is recommended, and Tom Cat is the beneficiary. (6) AI+3D: the beneficiary targets include Feng Shang Culture, Wind talk Building, Fan Tuo Shuangchuang, Silk Road Vision, Hengxin Oriental, etc. Risk hint: the progress of the multimodal large model is not as expected, and the commercial landing of AIGC is not as expected. [disclaimer] this article only represents the views of a third party and does not represent the position of Hexun. Investors operate accordingly, at their own risk.

freeonlinepoker| Media Industry Review Report: GPT-4O Multimodal Capabilities Jump Again in AI Applications or Accelerate Implementation

[disclaimer] this article only represents the views of a third party and does not represent the position of Hexun. Investors operate accordingly, at their own risk.