Expand shorten URLs in MATLAB
Sometimes I use MATLAB scripts to analyze Twitter posts. One issue is that Twitter uses shorten URLs started with t.co
for the links, such as retweets. To get the original URL, we need to determine the final redirected endpoint by using an URL expander.
Method
The API for getting expanded or redirected URL
The shorten URLs can be restored by using an online URL expander, called expandurl™ API. The REST endpoint URL is
http://expandurl.com/api/v1/
The query string to get the JSON response looks like:
?url={TARGET_URL}&format=json&detailed=true
A GET request should in the form of:
http://expandurl.com/api/v1/?url={TARGET_URL}&format=json&detailed=true
where {TARGET_URL}
is the URL to be expanded.
It will return a response in the JSON format look like:
{
"url":"{TARGET_URL}",
"http_code":200,
"redirect_count":0,
"total_time":32.04,
"redirect_time":0,
"rel_meta_refresh":[{
"url":"{REDIRECTED_URL}","time":"0"
}],
"original_url":"{TARGET_URL}","error_msg":"",
"rel_canonical":false,"rel_shortlink":false,"advanced_redirect":false
}
The {REDIRECTED_URL}
will be the URL of the final destination, which is what we interest to. To extract it, we can use Regular Expressions or a JSON parser to read data from the JSON response.
Sending GET request to the REST API
In MATLAB, we can use urlread
1 to download URL content. You can also use webread
2 to read content from RESTful web service if you use MATLAB 2014b or above. To send a request to the expandurl™ API, we can write a script like that:
targetTweetUrl = 'https://t.co/<ShortenTweetID>';
apiEndpoint = 'http://expandurl.com/api/v1/';
query = '?url={TARGET_URL}&format=json&detailed=true';
fullUrl = [apiEndpoint replace(query, '{TARGET_URL}', targetTweetUrl)];
response = urlread(fullURL);
Receiving data from the API response
For MATLAB/Octave, you can download JSONlab: a toolbox to encode/decode JSON files by Qianqian Fang from the File Exchange in MATLAB Central. The function to convert a JSON String to a MATLAB Struct is called loadjson
. We can run the following script to get the expanded URL,
json = loadjson(response);
redirectUrl = json.rel_meta_refresh.url;
Note: You have to sign up a free MathWorks Account if you don’t have one.
Alternatively, you can extract the required parts by string trimming, the code is shown below.
substr = strsplit(resonse, { '"url":"', ',"time":' });
redirectUrl = substr{2};