数据集:

0x22almostEvil/multilingual-wikihow-qa-16k

中文

Dataset Card for multilingual WikiHow with ~16.8K entries. ~(2-2.2)K for each language.

Warning [1]

The WikiHow team contacted me and made it clear that they forbid the use of their data for machine learning purposes . However, I am not calling for anything, and this dataset only shows the concept, and I strongly advise against violating their ToS. However, consultation with lawyers made it clear that dataset can be used for such purposes if the project has research purposes .

Warning [2]

Source code is kinda very bad, and I'm lazy to fix it.

Dataset Summary

Contains Parquet of a list of instructions and WikiHow articles on different languages.

Each row consists of

  • INSTRUCTION
  • RESPONSE
  • SOURCE (*.wikihow.com)
  • METADATA (json with url and language).

Licensing Information

Data is from WikiHow, license for content is located here: https://www.wikihow.com/wikiHow:Creative-Commons

Acknowledgements

This helped me a lot!

https://github.com/HelloChatterbox/PyWikiHow ; https://pypi.org/project/pywikihow/