AI consultation is easier with Wikipedia's 120 million-entry database

Cát Tiên (THEO techcrunch) |

The Wikidata Embedding project makes it easier for AI to access Wikipedia data, improving the ability to understand and use accurate information.

Wikimedia Germany has just announced the Wikidata Embedding project, a new database that makes it easier for AI models to access and understand Wikipedia's rich knowledge base.

The system applies vector-based grammatical search, a technique that helps computers identify the meaning and relationships between words, for nearly 120 million entries on Wikipedia and related platforms.

The project also integrates the Model ConceptCP (MCP), a standard that helps AI systems communicate directly with data sources.

As a result, large language models (LLM) can perform natural language queries, improving the ability to collect and use accurate information from Wikipedia.

The project is being implemented by Wikimedia Germany, in collaboration with Jina.AI and Data Stax, a real-time training data company owned by IBM.

Previously, Wikidata only supported keyword search and SPARQL queries, limiting the exploitation capability of AI.

The new system works well with models of increasing traceability (RAG) data creation, helping AI gather external information and build knowledge based on data that has been verified by the Wikipedia editor.

The data is also structured to provide language context, for example, queries from scientific houses will return to the list of famous nuclear scientists, researchers who have worked at Bell Labs, translations into multiple languages, licensed images, and related concepts such as lecturers or researcher houses.

This database can be accessed publicly on toolforge, and Wikidata will hold an online workshop for developers on October 9.

The project came into play in the context that AI developers are looking for high-quality data sources to refine the model.

With the increasing complexity of the AI training system, the need for reliable data is even more urgent, especially when Wikipedia provides more accurate information than large du lieu collections such as Common Crawl.

Philippe Saade, Wikidata's AI project manager, emphasized the project's independence and collaboration: " Strong AI does not necessarily have to be controlled by a small group of companies. It can be open, collaborative and serve everyone.

Cát Tiên (THEO techcrunch)
RELATED NEWS

Details about 6 new features on iOS and Android that have just been added by WhatsApp

|

Facebook has just announced a new update on iOS and Android, adding 6 outstanding features to help users chat, share photos and documents more conveniently.

ChatGPT launches super hero AI portrait creation feature, competing with Gemini Nano Banana

|

OpenAI integrates the super-heroic AI action portrait creation feature in ChatGPT, competing directly with Gemini Nano Banana.

Wikipedia tightens control of AI-based waste content

|

Wikipedia strengthens measures to combat AI-based content, tightens the quick Deletion rule, and develops a tool to support editors in protecting article neutrality.

People's Artist The Hien - author of the song "Nhan Lan Duong" passed away

|

Musician, People's Artist The Hien passed away on October 1 after a long battle with lung cancer.

Fallen trees hit, causing 2 grandmothers and grandchildren to be hospitalized in Ho Chi Minh City

|

HCMC - In Dong Hoa ward, fallen trees hit the two grandmothers and grandchildren. Both of them were taken to the emergency room by local people immediately.

This week, there is a plan to arrange public service units and state-owned enterprises

|

The Minister of Home Affairs said that he is completing the plan to reorganize public service units, reorganize state-owned enterprises and within the organizational structure of the ministries.

The friendship between the people of Vietnam and the Russian Federation will develop better and better

|

Chairman of the Central Committee of the Vietnam Fatherland Front Do Van Chien believes that the Comprehensive Strategic Partnership between Vietnam and the Russian Federation, as well as the friendship between the two peoples, will increasingly develop well, trust and support each other.

Remaining bodies found in the fish pond burst in Lao Cai

|

Lao Cai - The fish pond break in Bao Thang commune caused 4 workers to be swept away, including 2 deaths.

Details about 6 new features on iOS and Android that have just been added by WhatsApp

Cát Tiên (THEO hindustantimes) |

Facebook has just announced a new update on iOS and Android, adding 6 outstanding features to help users chat, share photos and documents more conveniently.

ChatGPT launches super hero AI portrait creation feature, competing with Gemini Nano Banana

Cát Tiên (T/H) |

OpenAI integrates the super-heroic AI action portrait creation feature in ChatGPT, competing directly with Gemini Nano Banana.

Wikipedia tightens control of AI-based waste content

Cát Tiên (T/H) |

Wikipedia strengthens measures to combat AI-based content, tightens the quick Deletion rule, and develops a tool to support editors in protecting article neutrality.