Post

๐Ÿš€Understanding Wikidata -๋ฌด๋ฃŒ๋ฃŒ wiki ๋ฐ์ดํ„ฐ ํ™œ์šฉํ•˜๊ธฐ!!

๐Ÿš€Understanding Wikidata -๋ฌด๋ฃŒ๋ฃŒ wiki ๋ฐ์ดํ„ฐ ํ™œ์šฉํ•˜๊ธฐ!!

๐Ÿš€ Understanding Wikidata - ๋ฌด๋ฃŒ๋กœ Wiki ๋ฐ์ดํ„ฐ ํ™œ์šฉํ•˜๊ธฐ!!

Wikidata๋Š” ์œ„ํ‚ค๋ฏธ๋””์–ด ์žฌ๋‹จ์ด ์šด์˜ํ•˜๋Š” ์ž์œ  ์ง€์‹ ๊ทธ๋ž˜ํ”„(Open Knowledge Graph)์ž…๋‹ˆ๋‹ค.
์ „ ์„ธ๊ณ„ ์ธ๋ฌผ, ์žฅ์†Œ, ๊ฐœ๋… ๋“ฑ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๊ตฌ์กฐํ™”๋œ ๋ฐฉ์‹์œผ๋กœ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

๐Ÿง  โ€œGPT๊ฐ€ ์‚ฌ์šฉํ•˜๋Š” ์„ธ๊ณ„ ์ง€์‹, ์šฐ๋ฆฌ๋„ ๋ฌด๋ฃŒ๋กœ ์“ธ ์ˆ˜ ์žˆ์„๊นŒ?โ€
๐Ÿ‘‰ ๊ทธ ๋‹ต์ด ๋ฐ”๋กœ Wikidata + SPARQL ์กฐํ•ฉ์ž…๋‹ˆ๋‹ค!


๐Ÿ“Œ Wikidata๋ž€?

  • ์œ„ํ‚ค๋ฐฑ๊ณผ์˜ ๋ฐ์ดํ„ฐ ๋ฒ„์ „
  • ์ „ ์„ธ๊ณ„ ๊ฐœ์ฒด(Entity)๋“ค์˜ ์†์„ฑ(Property)๊ณผ ๊ฐ’(Value)์ด ์ •๋ฆฌ๋œ ๊ฑฐ๋Œ€ํ•œ DB
  • ์˜ˆ์‹œ:
    • Q937 = Albert Einstein
    • P31 = instance of
    • Q5 = human
      โ†’ ์ฆ‰, โ€œQ937์€ Q5(human)์˜ ์ธ์Šคํ„ด์Šค(P31)์ด๋‹คโ€

๐Ÿ› ๏ธ SPARQL๋กœ ๋ฐ์ดํ„ฐ ์งˆ์˜ํ•˜๊ธฐ

SPARQL์€ RDF ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ์œ„ํ•œ ์ฟผ๋ฆฌ ์–ธ์–ด์ž…๋‹ˆ๋‹ค.
Wikidata๋Š” ์ž์ฒด SPARQL endpoint๋ฅผ ์šด์˜ํ•ฉ๋‹ˆ๋‹ค.

โ–ถ ๊ธฐ๋ณธ ์˜ˆ์ œ: ์•„์ธ์Šˆํƒ€์ธ์˜ ์ƒ๋…„์›”์ผ ์ฐพ๊ธฐ

1
2
3
4
5
6
7
SELECT ?person ?personLabel ?birthDate WHERE {
  ?person wdt:P31 wd:Q5;  # ์ธ๊ฐ„
          rdfs:label "Albert Einstein"@en;
          wdt:P569 ?birthDate.  # ์ƒ๋…„
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 1

โ–ถ ํŒŒ์ด์ฌ ์ฝ”๋“œ๋กœ์˜ ๊ธฐ๋ณธ ์˜ˆ์ œ: ์•„์ธ์Šˆํƒ€์ธ์˜ ์ƒ๋…„์›”์ผ ์ฐพ๊ธฐ

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import requests

query = """
SELECT ?person ?personLabel ?birthDate WHERE {
  ?person wdt:P31 wd:Q5;
          rdfs:label "Albert Einstein"@en;
          wdt:P569 ?birthDate.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 1
"""

url = "https://query.wikidata.org/sparql"
headers = {"Accept": "application/sparql-results+json"}
response = requests.get(url, headers=headers, params={"query": query})

data = response.json()
for result in data["results"]["bindings"]:
    print(result["personLabel"]["value"], "-", result["birthDate"]["value"])

๊ฒฐ๊ณผ๋Š”!? Albert Einstein - 1879-03-14T00:00:00Z


๐ŸŒ ์–ด๋–ค ๋ฐ์ดํ„ฐ๊นŒ์ง€ ์–ป์„ ์ˆ˜ ์žˆ๋‚˜์š”?

Wikidata๋Š” ์œ„ํ‚ค๋ฐฑ๊ณผ์˜ ๊ตฌ์กฐํ™”๋œ ๋ฐฑ์—”๋“œ๋กœ, ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค:

  • ๐Ÿ‘ค ์ธ๋ฌผ ์ •๋ณด: ์ƒ๋…„์›”์ผ, ์ง์—…, ๊ตญ์ , ์—…์  ๋“ฑ
  • ๐Ÿ—บ๏ธ ๊ตญ๊ฐ€ ๋ฐ ์ง€๋ฆฌ ์ •๋ณด: ์ˆ˜๋„, ์ธ๊ตฌ ์ˆ˜, ์œ„์น˜, ๊ตญ๊ฐ€์ฝ”๋“œ
  • ๐ŸŽฌ ์˜ํ™” ์ •๋ณด: ๊ฐ๋…, ๊ฐœ๋ด‰์ผ, ๋ฐฐ์šฐ ๋ชฉ๋ก, ์ œ์ž‘๊ตญ๊ฐ€
  • ๐Ÿ“ ํ•™์ˆ  ๊ฐœ๋…: ์ˆ˜ํ•™ ๊ณต์‹, ๊ณผํ•™ ์ด๋ก , ์—ญ์‚ฌ์  ์‚ฌ๊ฑด, ๋ฌธํ™”์žฌ
  • ๐Ÿงช ๊ณผํ•™ ๋ฐ์ดํ„ฐ: ํ™”ํ•™ ๋ฌผ์งˆ, ์ƒ๋ฌผ ์ข…, ์œ ์ „์ž ์ •๋ณด ๋“ฑ

๐Ÿ‘‰ ์œ„ํ‚ค๋ฐฑ๊ณผ์— ์š”์•ฝ ์ •๋ณด๋กœ ๋ณด์ด๋Š” ๋Œ€๋ถ€๋ถ„์˜ structured data๋Š” Wikidata์— ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.


โœ… ์™œ ์œ ์šฉํ•œ๊ฐ€?

์ด์œ ์„ค๋ช…
๐Ÿ’ธ ๋ฌด๋ฃŒCC0 ๋ผ์ด์„ ์Šค โ†’ ์ƒ์—…์  ํ™œ์šฉ ๊ฐ€๋Šฅ
๐ŸŒ ๋‹ค๊ตญ์–ด300+ ์–ธ์–ด ๋ผ๋ฒจ ์ œ๊ณต (ex. ํ•œ๊ตญ์–ด, ์˜์–ด ๋“ฑ)
๐Ÿ”— ์—ฐ๊ฒฐ์„ฑ๊ฐœ์ฒด(Entity) ๊ฐ„ ๊ด€๊ณ„(RDF triple)๋กœ ์—ฐ๊ฒฐ๋จ
โšก ๋น ๋ฆ„SQL์ด ์•„๋‹Œ SPARQL โ†’ ์›ํ•˜๋Š” ๊ด€๊ณ„ํ˜• ์ฟผ๋ฆฌ ๊ฐ€๋Šฅ
๐Ÿค– AI ํ™œ์šฉLLM ์ „์ฒ˜๋ฆฌ, RAG ๋ฌธ์„œ ์ƒ์„ฑ, ์ง€์‹ ๊ธฐ๋ฐ˜ ๊ตฌ์ถ•์— ์ตœ์ 

โ— ๋‹จ์ ์€?

Wikidata๋Š” ๊ฐ•๋ ฅํ•œ ์˜คํ”ˆ ์ง€์‹ ๊ทธ๋ž˜ํ”„์ด์ง€๋งŒ, ์•„๋ž˜์™€ ๊ฐ™์€ ์ œ์•ฝ๋„ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค:

ํŠนํ•˜ SPARQL์€ ๋„ˆ๋ฌด ์–ด๋ ค์›Œ๋ณด์—ฌ์š”!!

ํ•œ๊ณ„์„ค๋ช…
๐Ÿงน ์ •๋ณด ์ •ํ™•๋„๋ˆ„๊ตฌ๋‚˜ ํŽธ์ง‘ ๊ฐ€๋Šฅ โ†’ ์ผ๋ถ€ ์ •๋ณด๋Š” ์˜ค๋ฅ˜ ๋˜๋Š” ๊ฒ€์ฆ ๋ถ€์กฑ
๐Ÿ” ๋ฐ์ดํ„ฐ ๋ˆ„๋ฝํŠนํžˆ ๋น„์˜์–ด๊ถŒ์ด๋‚˜ ์ตœ์‹  ์ •๋ณด์˜ ๋ฐ˜์˜์ด ๋Šฆ์„ ์ˆ˜ ์žˆ์Œ
๐Ÿ’ฌ ๋น„์ •ํ˜• ์ •๋ณด ๋ถ€์กฑ์„ค๋ช…ํ˜• ๋ฌธ์žฅ์ด๋‚˜ ์ƒ์„ธํ•œ ์„œ์ˆ ์€ ํฌํ•จ๋˜์ง€ ์•Š์Œ (โ†’ Wikipedia ์ฐธ๊ณ  ํ•„์š”)
๐Ÿ“š SPARQL ๋Ÿฌ๋‹์ปค๋ธŒ์ฟผ๋ฆฌ ์–ธ์–ด๊ฐ€ ์ƒ์†Œํ•  ์ˆ˜ ์žˆ์Œ (โ†’ ํ•™์Šต ํ•„์š”)
๐Ÿ“ฆ ์ •๋ณด ๊ฐฑ์‹  ์ฃผ๊ธฐAPI๋ฅผ ํ†ตํ•ด ์‹ค์‹œ๊ฐ„ ํŽธ์ง‘ ์ •๋ณด๋Š” ๋ฐ˜์˜๋˜์ง€๋งŒ, ๋ฐฐ์น˜์ฒ˜๋ฆฌ ๊ธฐ๋ฐ˜ ํ™œ์šฉ ์‹œ ์ง€์—ฐ ๊ฐ€๋Šฅ

๋‹จ์ ์„ ํ•ด๊ฒฐํ•˜๊ธฐ!!! - LLM์™€ ์—ฐ๊ฒฐํ•ด์„œ ์ž๋™์œผ๋กœ SQLRQL์„ ์งœ๋ฒ„๋ ค์š”!!

ํŒŒ์ด์ฌ์œผ๋กœ ์˜ˆ์”จ!!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import requests
import json
import openai

openai_client = openai.OpenAI()

user_q = "์ˆ˜ํ•™์ž ์•„์ธ์Šˆํƒ€์ธ์˜ ์ƒ์ผ์€ ์–ธ์ œ์•ผ?"
prompt = f"""
You are a SPARQL expert. Convert the following user question into a valid SPARQL query using Wikidata schema. 
Only return the query in SPARQL syntax.

Question: "{user_q}"
"""

response = openai_client.chat.completions.create(
model="gpt-4o-mini",  # You can use "gpt-4" or "gpt-3.5-turbo"
messages=[
    {"role": "user", "content": prompt}
],
max_tokens=150,
temperature=0.7
)
query = response.choices[0].message.content.strip()
query = query.replace('```','').split('sparql')[-1]
print(f"Generated Query : {query}")
# 2. SPARQL ์ฟผ๋ฆฌ๋ฅผ Wikidata์— ์ „์†ก

url = "https://query.wikidata.org/sparql"
headers = {"Accept": "application/sparql-results+json"}
response = requests.get(url, headers=headers, params={"query": query})
data = response.json()

# 3. ๊ฒฐ๊ณผ ์ถœ๋ ฅ
bindings = data["results"]["bindings"]
result_lines = []
for b in bindings:
    line = " ยท ".join([f"{k}: {v['value']}" for k, v in b.items()])
    result_lines.append(line)
print("FINAL RESULTS")
print("\n".join(result_lines))

โœจ ๋งˆ๋ฌด๋ฆฌ

  • โ€œํฌ๋กค๋ง ์—†์ด, ์ „ ์„ธ๊ณ„ ์ง€์‹ ๊ทธ๋ž˜ํ”„๋ฅผ ๋‹ค๋ฃจ๋Š” ๋ฒ•!โ€
  • ๋ฐ”๋กœ Wikidata + SPARQL ์กฐํ•ฉ์œผ๋กœ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
  • llm์ด๋ž‘ ์—ฐ๊ฒฐ์‹œํ‚ค๋ฉด!! wiki์—์„œ ์ง€์‹์„ ์ฐพ์•„์„œ ์ •ํ™•ํ•œ ๋‹ต๋ณ€์ด ๊ฐ€๋Šฅํ•ด์š”!!

This post is licensed under CC BY 4.0 by the author.