---
title: "Using rdomains"
author: "Gaurav Sood"
date: "`r Sys.Date()`"
vignette: >
  %\VignetteIndexEntry{Illustrating use of rdomains}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

### rdomains: Get the category of content hosted by a domain


#### Install and Load the package

The latest development version of the package will always be on GitHub. To install the package from GitHub and to load the installed package:

```{r, eval=FALSE, install}
#library(devtools)
install_github("themains/rdomains")
```

To install the package from CRAN, type in:

```{r, eval=FALSE, cran_install}
install.packages("rdomains")
```

Next, load the package:

```{r, eval=FALSE, load_pkg}
library(rdomains)
```

#### Shalla

To get category of the content from Shallalist (service discontinued - using archived data), first download the archived data using:

```{r, eval=FALSE, down_shalla}
get_shalla_data()
```

And then, get the category using:

```{r, eval=FALSE, shalla}
shalla_cat("http://www.google.com")
```

```
##   domain_name shalla_category
## 1  google.com   searchengines
```

#### DMOZ

To get category of the content from DMOZ, first download the archived parsed CSV file using:

```{r, eval=FALSE, down_dmoz}
get_dmoz_data()
```

And then, get the category using:

```{r, eval=FALSE, dmoz}
dmoz_cat("http://www.google.com")
```

#### ML

Probability that Domain Hosts Adult Content Based on features of Domain Name and Suffix alone:

```{r, eval=FALSE, ml}
adult_ml1_cat("http://www.google.com")
```

```
##   domain_name  category
## 1  google.com 0.3133728
```

#### VirusTotal

Start by getting the API key from [VirusTotal](https://www.virustotal.com/).

The package uses the VirusTotal API v3 for comprehensive domain analysis:

```{r, eval=FALSE, virustotal}
virustotal_cat("http://www.google.com")
```

#### OpenAI GPT Models

Get domain categorization using OpenAI's GPT models. You'll need an OpenAI API key:

```{r, eval=FALSE, openai}
# Set your API key
Sys.setenv("OPENAI_API_KEY", "your-api-key-here")

# Classify domains
openai_cat("google.com")
```

```
##   domain_name openai_category
## 1  google.com      technology
```

You can also specify custom categories:

```{r, eval=FALSE, openai_custom}
openai_cat(c("amazon.com", "github.com"), 
           categories = c("ecommerce", "technology", "social", "other"))
```

#### Anthropic Claude

Get domain categorization using Anthropic's Claude models. You'll need an Anthropic API key:

```{r, eval=FALSE, claude}
# Set your API key  
Sys.setenv("ANTHROPIC_API_KEY", "your-api-key-here")

# Classify domains
claude_cat("facebook.com")
```

```
##   domain_name claude_category
## 1 facebook.com          social
```

