17.5 C
New York
Wednesday, June 12, 2024

Qwen2: Alibaba Cloud’s Open-Supply LLM


Many new corporations are popping up and releasing new open supply Giant Language Fashions within the coming years. As time progresses, these fashions have gotten nearer and nearer to the paid closed-source fashions. These corporations are releasing these fashions in numerous sizes and ensuring to maintain their licenses in order that anybody can use them commercially. One such group of fashions is Qwen. Its earlier fashions have confirmed to be the most effective open supply fashions alongside Mistral and Zephyr and now they’ve just lately introduced a model 2 of it referred to as the Qwen2.


Studying Targets

  • Study Qwen, Alibaba Cloud’s open-source language fashions.
  • Uncover Qwen2’s new options.
  • Assessment Qwen2’s efficiency benchmarks.
  • Attempting Qwen2 with the HuggingFace Transformer library.
  • Acknowledge Qwen2’s business and open-source potential.

This text was revealed as part of the Information Science Blogathon.

What’s Qwen?

Qwen refers to a household of Giant Language Fashions backed by Alibaba Cloud, a agency positioned in China. It has made an amazing contribution to AI House by releasing lots of its open-source fashions which are on par with the highest fashions on the HuggingFace leaderboard. Qwen has launched its fashions in several sizes starting from the 7 Billion Parameter mannequin to the 70 Billion Parameter mannequin. They haven’t simply launched the fashions however have finetuned them in a means that was on the prime of the leaderboard once they had been launched.

However Qwen didn’t cease it with this. It has even launched Chat Finetuned fashions, LLMs that had been closely skilled in Arithmetic and Code. It has even launched imaginative and prescient language fashions. The Qwen group is even shifting to the audio area to launch Textual content-to-speech fashions. Qwen is making an attempt to create an ecosystem of open-source fashions available for everybody to start out constructing purposes with them with none restrictions and for business functions.

What’s Qwen2?

Qwen obtained a lot appreciation from the open-source group when it was launched. A number of derivates have been created from this Qwen mannequin. Just lately the Qwen group has introduced a collection of successor fashions to its earlier era, referred to as the Qwen2 with extra fashions and extra finetuned variations in comparison with earlier generations.

Qwen2 was launched in 5 totally different sizes, which embody the 0.5B, 1.5B, 7B, 14B, and 72 Billion variations. These fashions have been pretrained on greater than 27 totally different languages and have been considerably improved within the areas of code and arithmetic in comparison with the sooner era of fashions. The good factor is right here is that even the 0.5B and the 1.5B fashions include 32k context size. Whereas the 7B and the 72B include 128k context size.

All these fashions have Grouped Question Consideration, which enormously accelerates the method of consideration and the quantity of reminiscence required to retailer the intermediate outcomes in the course of the inference.

Efficiency and Benchmarks

Coming to the bottom mannequin comparisons, the Qwen2 72B Giant Language Mannequin outperforms the newly launched Llama3 70B mannequin and the combination of exports Mixtral 8x22B mannequin. We are able to see the benchmark scores within the under pic. The Qwen mannequin outperforms each the Llama3 and Mixtral in lots of benchmarks like MMLU, MMLU-Professional, TheoremQA, HumanEval, GSM8k and plenty of extra.

Qwen2: Performance and Benchmarks

Coming to the smaller mannequin i.e. the Qwen2 7B Instruct Mannequin, it additionally outperforms the newly launched SOTA(State-Of-The-Artwork) fashions just like the Llama3 8B Mannequin and the GLM4 9B Mannequin. Regardless of Qwen2 being the smallest mannequin of the three, it outperforms each of them and the outcomes for all of the benchmarks will be seen within the under pic.

Qwen2 7B Instruct Model

Qwen2 in Motion

We shall be working with Google Colab to check out the Qwen2 mannequin.

Step1: Obtain Libraries

To get began, we have to obtain just a few helper libraries. For this, we work with the under code:

!pip set up -U -q transformers speed up
  • transformers: It’s a standard Python package deal from HuggingFace, with which we will obtain any deep studying fashions and work with them.
  • speed up: Even this, is a package deal developed by HuggingFace. This package deal helps in rising the inference pace of the Giant Language Fashions when they’re working on the GPU.

Step2: Obtain the Qwen Mannequin

Now we are going to write the code to obtain the Qwen mannequin and check it. The code for this shall be:

from transformers import pipeline

gadget = "cuda"

pipe = pipeline("text-generation",
  • We begin by importing the pipeline perform from the transformers library.
  • Then we set the gadget to which the mannequin must be mapped to. Right here, we set it to cuda, which implies the mannequin shall be despatched to GPU if accessible.
  • mannequin=”Qwen/Qwen2-1.5B-Instruct”: This tells the pre-trained mannequin to be labored with gadget=gadget: This tells the gadget for use for working the mannequin.
  • max_new_tokens=512: Right here, we give the utmost variety of new tokens to be generated.
  • do_sample=True: This allows sampling throughout era for elevated range within the output.
  • temperature=0.7: This controls the randomness of the generated textual content. Increased values result in extra inventive and unpredictable outputs.
  • top_p=0.95: This units the likelihood mass to be thought of for the subsequent token throughout era.

Step3: Giving Checklist of Messages to the Mannequin

Now, allow us to attempt giving the mannequin an inventory of messages for the enter and see the output that it generates for the given checklist of messages.

messages = [
    {"role": "system",
     "content": "You are a funny assistant. You must respons to user questions in funny way"},
    {"role": "user", "content": "What is life?"},

response = pipe(messages)

  • Right here, the primary message is a system message that instructs the assistant to be humorous.
  • The second message is a person message that asks “What’s life?”.
  • We put each these messages as objects in an inventory.
  • Then we give this checklist, containing an inventory of messages to the pipeline object, that’s to our mannequin.
  • The mannequin then processes these messages and generates a response.
  • Lastly, we extract the content material of the final generated textual content from the response.

Operating this code has produced the next output:


We see that the mannequin certainly tried to generate a humorous reply.

Step4: Testing the Mannequin with Arithmetic Questions

Now allow us to check the mannequin with just a few arithmetic questions. The code for this shall be:

messages = [
    {"role": "user", "content": "If a car travels at a constant speed of 
    60 miles per hour, how far will it travel in 45 minutes?"},
    {"role": "assistant", "content": "To find the distance, 
    use the formula: distance = speed × time. Here, speed = 60 miles per 
    hour and time = 45 minutes = 45/60 hours. So, distance = 60 × (45/60) = 45 miles."},
    {"role": "user", "content": "How far will it travel in 2.5 hours? Explain step by step"}

response = pipe(messages)

  • Right here once more, we’re creating an inventory of messages.
  • The primary message is a person message that asks how far a automotive will journey in 45 minutes at a continuing pace of 60 miles per hour.
  • The second message is an assistant message that gives the answer to the person’s query utilizing the system distance = pace × time.
  • The third message is once more a person message asking the assistant one other query.
  • Then we give this checklist of messages to the pipeline.
  • The mannequin will then course of these messages and generate a response.

The output generated by working the code will be seen under:


We are able to see that the Qwen2 1.5B mannequin began considering step-by-step to reply the person query. It first began by defining the system to calculate distance. Following that, it wrote down the knowledge it has concerning the pace and time. Then it has lastly put collectively these items to make up the ultimate reply. Regardless of simply being a 1.5 Billion Parameter mannequin, the mannequin is really working effectively.

Testing with Extra Examples

Allow us to check the mannequin with just a few extra examples:

messages = [
    {"role": "user", "content": "A clock shows 12:00 p.m. now. 
    How many degrees will the minute hand move in 15 minutes?"},
    {"role": "assistant", "content": "The minute hand moves 360 degrees 
    in one hour (60 minutes). Therefore, in 15 minutes, it will 
    move (15/60) * 360 degrees = 90 degrees."},
    {"role": "user", "content": "How many degrees does the hour hand 
    move in 15 minutes?"}

response = pipe(messages)

messages = [
    {"role": "user", "content": "Convert 100 degrees Fahrenheit to Celsius."},
    {"role": "assistant", "content": "To convert Fahrenheit to Celsius,
     use the formula: C = (F - 32) × 5/9. So, for 100 degrees Fahrenheit, 
     C = (100 - 32) × 5/9 = 37.78 degrees Celsius."},
    {"role": "user", "content": "What is 0 degrees Celsius in Fahrenheit?"}

response = pipe(messages)

messages = [
    {"role": "user", "content": "What gets wetter as it dries?"},
    {"role": "assistant", "content": "A towel gets wetter as it dries 
    because it absorbs the water from the body, becoming wetter itself."},
    {"role": "user", "content": "What has keys but can't open locks?"}

response = pipe(messages)


Right here we’ve moreover examined the mannequin with three different examples. The primary two examples are on arithmetic once more. We see that Qwen2 1.5B was capable of perceive the query effectively and was capable of generate a lovely reply.

However within the instance, it has failed. The reply to the query is the piano keys. That could be a piano that has keys however can’t open locks. The mannequin has didn’t reply this however got here up with a unique reply. It answered the keychain and even gave a supporting assertion to it. We can not precisely say it has failed as a result of technically a keychain comprises open locks however the keys within the keychain do.

General, we see that regardless of being a 1.5 Billion parameter mannequin, the Qwen2 1.5B has answered the mathematical questions appropriately and was capable of present good reasoning across the solutions it generated. This tells us the larger parameter fashions just like the Qwen2 7B, 14B, and 72B fashions can carry out extraordinarily effectively in several duties.


Qwen2, a brand new collection of open-source fashions from Alibaba Cloud, represents an amazing development within the subject of enormous language fashions (LLMs). Constructing on the success of its predecessor, Qwen2 presents a spread of fashions from 0.5B to 72B parameters, excelling in efficiency throughout numerous benchmarks. The fashions are designed to be versatile and commercially accessible, supporting a number of languages and that includes improved capabilities in code, arithmetic, and extra. Qwen2’s spectacular efficiency and open accessibility place it as a formidable competitor to closed-source alternate options, fostering innovation and software growth in AI.

Key Takeaways

  • Qwen2 continues the development of high-quality open-source LLMs, offering sturdy alternate options to closed-source fashions.
  • The Qwen2 collection contains fashions from 0.5 billion to 72 billion parameters, catering to numerous computational wants and use circumstances.
  • Qwen2 fashions are pretrained in over 27 languages, enhancing their applicability in world contexts
  • Licenses that enable for business use promote widespread adoption and innovation of Qwen2 fashions.
  • Builders and researchers can simply combine and make the most of the fashions through standard instruments like HuggingFace’s transformers library, making them accessible.

Regularly Requested Questions

Q1. What’s Qwen?

A. Qwen is a household of enormous language fashions created by Alibaba Cloud. They launch open-source fashions in numerous sizes which are aggressive with paid fashions.

Q2. What’s Qwen2?

A. Qwen2 is the newest model of Qwen fashions with improved efficiency and extra options. It is available in totally different sizes, starting from 0.5 billion to 72 billion parameters.

Q3. How do I exploit Qwen2 for textual content era?

A. You should use the pipeline perform from the Transformers library to generate textual content with Qwen2. The code instance within the article reveals how to do that.

This autumn. How does Qwen2 carry out?

A. Qwen2 outperforms different main fashions in lots of benchmarks, together with language understanding, code era, and mathematical reasoning.

Q5. Can Qwen2 reply math questions?

A. Sure, Qwen2, particularly the bigger fashions, can reply math questions and supply explanations for the solutions.

The media proven on this article will not be owned by Analytics Vidhya and is used on the Creator’s discretion.

Supply hyperlink

Related Articles


Please enter your comment!
Please enter your name here

Latest Articles