数据集:
0x22almostEvil/multilingual-wikihow-qa-16k
The WikiHow team contacted me and made it clear that they forbid the use of their data for machine learning purposes . However, I am not calling for anything, and this dataset only shows the concept, and I strongly advise against violating their ToS. However, consultation with lawyers made it clear that dataset can be used for such purposes if the project has research purposes .
Source code is kinda very bad, and I'm lazy to fix it.
Contains Parquet of a list of instructions and WikiHow articles on different languages.
Each row consists of
Data is from WikiHow, license for content is located here: https://www.wikihow.com/wikiHow:Creative-Commons
This helped me a lot!
https://github.com/HelloChatterbox/PyWikiHow ; https://pypi.org/project/pywikihow/