Hyperlink Classification via Structured Graph Embedding


The Program

We formally define a hyperlink classification problem in web search by classifying hyperlinks into three classes based on their roles: navigation, suggestion, and action. Real-world web graph datasets are generated for this task.We approach the hyperlink classification problem from a structured graph embedding perspective, and show that we can solve the problem by modifying the recently proposed knowledge graph embedding techniques. The key idea of our modification is to introduce a relation perturbation while the original knowledge graph embedding models only corrupt entities when generating negative triplets in training. To the best of our knowledge, this is the first study to apply the knowledge graph embedding idea to the hyperlink classification problem. We show that our model significantly outperforms the original knowledge graph embedding models in classifying hyperlinks on web graphs.


Download

The code is released under the GNU Public License (GPL). The original code source is from https://github.com/thunlp/KB2E.
The train code is written in C++ and the test code is written in Python.

To download the code, please click here.
To download the data, please click here.


Usage

      Sample train set (train_example.txt) and test set (test_example.txt) are included in each model.
      Please run the script to train/test the model.
    
 -Train
      In the command line, please type:
            make

      You can train the model (TransE for example) using the following command:
            ./Train_TransE train_example.txt 404 437
            (404 is no. of nodes, 437 is no. of edges)

      You will get entity2id and relation2id, which are embedding results of entities and relations respectively.
    
 -Test
      You can evaluate the model (TransE for example) using the following command:
            python3 test_TransE.py test_example.txt

      You will get result.txt, which contains a confusion matrix, F1, Precision, Recall, and Accuracy of the model.
    

Citation

Please acknowledge the use of the code/data with a citation.

Hyperlink Classification via Structured Graph Embedding, G. Lee, S. Kang, and J. J. Whang, ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), July 2019. [pdf]
@inproceedings{lee-sigir2019,
  author = {Lee, Geon and Kang, Seonggoo and Whang, Joyce Jiyoung},
  title = {Hyperlink Classification via Structured Graph Embedding},
  booktitle = {Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval},
  year = {2019},
  pages = {1017--1020}
}
Bug reports and comments are always appreciated.