standard office furniture dimensions - Hallo friend furniture stands lover, At this time sharing furniture stands entitled standard office furniture dimensions, I have provided furniture stands ideas. hopefully content of posts that I wrote this home design, Furniture Decorating, interior, furniture stands can be useful. OK, following its coverage of furniture stands ideas..

About : standard office furniture dimensions
Title : standard office furniture dimensions

standard office furniture dimensions

[music playing] male speaker: shanghaigdg is a very interesting developer community. female speaker: i'mglad somebody has asked this question. male speaker: this is wherethe magic happens. female speaker: this isprimarily a question and answer show, so if anyof you out there would like to ask questions.

ryan boyd: hi, everyone. welcome to the ryanand michael show. i'm ryan boyd. michael manoochehri: and i'mmichael manoochehri. ryan boyd: and we're here todayto talk about preparing your data for bigquery. so if you have any questionsthroughout this little discussion on preparing yourdata and loading your data into bigquery, you can feel freeto ask them in the google

moderator that is available ondevelopers.google/com/live. hopefully, you're watchingit there. and you can ask your questions,and we'll save some time for the endto answer them. of course, if you have any otherquestions after today or you're not watching it live, youcan ask those questions on stack overflow, and we'llanswer them there. or reach out to michael ormyself on our google+ handles that you see on this page.

so looking forward tohearing from you. now let's dive in. so the first thing we want totalk about is the data format. what does your data need to looklike when you load your data into bigquery? it's pretty simple. it's a csv-formatted file. so here we can see threelines of a csv file. this is actually datarepresenting the--

sorry. michael, do you rememberwhat this data was? [laughter] ryan boyd: sorry. it's been a little bit. michael manoochehri: yeah. that's right. it looks like this might beelection data or maybe natality data.

ryan boyd: yeah, exactly. so three rows of datain a csv file. and we can see some ofour basic types here. we have a strings. we have ints. we have floats. and we have boolean values. those are the maintypes that you're used to using in bigquery.

and you just format a csv file,new line delimited. you can quote your stringsand things like that, but basic types. and our goal through the restof the discussion here today is how to prepare thesecsv files and get these into bigquery. now, we have developersusing a variety of different data sources. how you prepare your data andload it into bigquery often

really depends on your datasource that you're using. so let's talk about thetypes of data sources that bigquery users-- our developers or customers-- are using to loadinto bigquery. it's all sort of transactionaldata. so the data sources commonlyused would make sense in a bigquery-style database aretransactional data, things where the append-only style ofbigquery works out well.

so web logs is one ofthe most popular. the standard apache web blog,various fields representing information about the userthat's accessing the site, what page that they'reaccessing, the host, things like that. that type of data is fantasticfor bigquery. and those web logs can be thingsthat are on your own local web server in yourlocal data center. or it can also be cloud-basedweb logs.

so app engine logs or logs from other cloud-based platforms. so that's one of themost common. but then you might actually havesome more specific types of log files for mobile orweb application events. so, for instance, on the googleplay store, there are a number of applications there. one of the top applications onthe google play store actually uses bigquery to runa lot of analysis.

and i'm not surewhat particular data they're running. but web application events andmobile application events is a really great type of data toload into bigquery and analyze with bigquery. and you could often get lots oftraffic on the initial day you launch on the play store. and you can potentially get alot of customers immediately. and you want to be able to trackhow they're using your

product and going to use thatinformation to improve your application that you'reoffering. syslog-style data formachine performance. the range of different usecases that developers use bigquery for is very diverse. so we have people using it forad monetization, but also people using it to monitortheir servers. and that's what this one isabout-- syslog-style data monitoring machineperformance.

we actually have a case studythat should be out soon that really talks about how one ofour developers uses it to monitor the performance of theirapplications and the machines that are hostingthose applications. then, of course, were googleadvertising click and impression data, salestransaction records, all very important types of transactionaldata that is great to load into bigqueryfor analysis. and the last one i mention hereis app engine data store.

so oftentimes, you may havesensors in machines all around the world, and you want to loadthat data into bigquery. it's best to use the cloud toreally collect that data, and app engine's a greatway to do that. so you can collect the data inapp engine, put it into the data store, and then take thedata out of the data store, and load it into bigqueryin batches. and we'll talk about how we'veused some of our google technologies for doingthat here shortly.

so now i want to talk about theprocess of loading data into bigquery. the process of loading datais pretty simple. your data is loaded by preparingyour data, uploading it to google cloud storage, andthen from cloud storage, you're basically making a apicall to bigquery and telling it to ingest the datafrom cloud storage. it's a fairly simple process. i will say that you cantechnically skip the middle

part of this process hereand upload your data directly to bigquery. we generally don't recommendit because cloud storage is really optimized. all the apis and tools aroundcloud storage are really optimized for handlinglarge files. and so that's often thegreat way to do it. you get your large files intocloud storage, and then you let bigquery retrieve them fromcloud storage as part of

its ingestion process. and that's the least error-proneway of doing this. and so we've had developerstry to upload their files directly to bigquery, andsometimes they just run into some issues. and those issues are usuallybecause they're uploading very large files, and they're tryingto make a plain http request to upload those largefiles and not taking advantage of our resumable uploadfeature that we have.

and the resumable uploadfeature, if you're using the resumable upload feature, ithink you should be able to about the same reliability, evenif you skip cloud storage in this process, uploadingdirectly to bigquery. try it out. let us know whatyou're seeing. and we'd be happy to know ifthere's great success in using that resumable upload to uploaddirectly to bigquery. our libraries support this.

and a lot of the librariessupport the ability to do resumable uploads. we don't yet have it superwell-documented on the bigquery side. we're working on that. but you can see the genericlibrary documentation on how to do resumable uploads. now, the next thing we wanted totalk about-- and i'm going to turn over to michael forthis-- is chunking your data,

breaking up your data into theparts required for ingesting it into bigquery. so let's dive into it. so as you know, bigquery is usedfor analyzing or asking questions about massivedata sets. but to get your data intobigquery, it's good to chunk it into parts. so bigquery supports ingestionor loading of both compressed, with the gzip, or uncompressedfiles.

right now, the maximum size ofeach file is 4 gigabytes. but you can ingest largebatches of files. you can ingest up to 500 filesin one load job as long as the total job is under100 gigabytes. so anecdotally, i've run sometests using some of the data that we have internally. and i've been able to ingestupwards of 300 gigabytes of uncompressed data in one loadjob by first gzipping that data and then ingesting itinto a single batch.

so you can actually putin a lot of data in one batch ingestion. this is another advantageto using the cloud storage for ingesting. you can ingest giantbatches that are stored in cloud storage. so it's really great. let us deal with theinfrastructure. i think that's usuallya good lesson.

and so let's talk about speedingup the data loading. so i just told you that youcould ingest an enormous amount of data inone load job. there's other ways that youcan even improve this ingestion rate. we actually let you ingesttwo concurrent ingestion jobs at a time. and so you can have two-- thesehuge batches-- two of them in flight ata single time.

and so you can do that,two concurrent. and we can also allow youto do 1,000 per day. so you can do the math. if you're ingesting hundreds ofgigabytes of data, you can ingest 1,000 of them a day. and so you can get a massiveamount of data ingested into bigquery in one go. ryan boyd: now, some people trywith that 1,000 per day to stream data in.

michael manoochehri: right. ryan boyd: and generally,bigquery is not designed right now for streaming-styleingestion. you can still upload and do manyingestion jobs per hour and have sort of nearreal-time data. but you're not goingto get down to the second data quite yet. if it's something you'reinterested in-- yeah, definitely let us know.

michael manoochehri: yeah, it'sstill a batch process. we'd love to hear it. actually, ryan touchedon this earlier. some people are using the datastore to collect data. so that's a very highlyavailable, performant way to collect data in real time andthen using a batch job from there to put the datainto bigquery. so there's other waysyou can handle that. ryan boyd: and one thing i willsay on the job style,

we've been asked this questiona number of times on stack overflow and other things, iswhether when the job fails-- in some cases, you havea misformatted line or something like that. if the job fails, does itever partially complete? and the answer is no. when a job fails with bigquery,that job is completely failed. we haven't updatedyour data at all.

so you just retry that jobwithout ever having to worry about potential errorconditions or anything along the way. and one other thing, actually,i should say that we kind of forgot as we were preparing forthis but i think is really important is, at times, yourdata's not perfect. you're loading data in, and it'snot absolutely perfect. there's some formatting issuesor what have you. you can actually specify kindof your error tolerance when

you load data into bigquery. ryan boyd: either using thebigq client or using the client libraries when you createyour job json object. you can basically say i can haveup to, say, 50 lines of data that are bad. misformatted or havebad encoding or so often, you'll want to kindof do that as your first experimenting. you'll want to set a little bithigher threshold as your

first experiment. and as you get more and moredata into bigquery and you get confidence in your data formatand your collection techniques, then youcan actually change that and back off. michael manoochehri: when you'redealing with huge file sizes, when you're dealingterabyte file sizes or gigabyte files sizes, thechances of having a bad line because of something thatyour output writer

dealt with is high. so this is a greatfeature for that. ryan boyd: yeah. exactly. so fantastic. michael manoochehri: cool. so let's talk about designinga bigquery schema. so if you're from the relationaldatabase world, you're probably used to what wecall normalization of data,

which is to keep everything inone place and just have one copy of it. so an example we have here, thisis a relational database of two tables, one with parentsand one with a person. so a person generallyhas two parents. and in this type schema,the parents will have a key like an id. and basically all the datawill be in one place. so your birth record willbe in one place.

and your parents will be asingle record in a parents table, your father and mother. so this is a normalized table. so bigquery likes to havedenormalized data, non-relational data. so the idea will be to havea record with every bit of information flattened. so all the data that you sawon the other table will be flattened into asingle record.

so, in practice, this meansthere can be redundancy and can be null values, whatwe call sparsity. and this is the kindof table that bigquery actually accepts. so when you have relationaldata, what you want to do is flatten it out intoa single record denormalized table for bigquery. ryan boyd: actually, i want tosay in terms of the advantages or the reasons you're taught inschool to always normalize

your data into the multipletables that michael just talked about here, one of thereal advantages of that is just maintaining consistencyof your data. so if you update informationabout a parent in this case, that all their children seethe same parent metadata. the parent change their ageor something like that. all of the children recordsreflect that, and you never have inconsistentsources there. but with a bigquery-styledatabase, where it's sort of

an append-only transactionaldata and you're really trying to do an analysis, theconsistency isn't a problem there because you've had theconsistency through the life cycle of the data as thedata's being created. but then as you're analyzing it,the flat structure is much better for performance. and you don't lose anything bynot having that consistency. that's great. so building off that, anothertechnique for bigquery is to

actually shard your data. so by taking, say-- in this example, we're takingbirth records from different years and placing them intodifferent tables. there's a lot of advantagesto doing this. first, for example, bigqueryactually is priced by the amount of data you query. so by sharding your data, youcan actually make your queries as efficiently pricedas possible.

this is also a good way to justmake your queries more manageable. so you can say i want to dealwith data from 2011 and have the data sharded there. so this is a great way logicallyto organize your data into tables, into thesekind of denormalized flattened tables that we talked about. ryan boyd: and, of course,if you're trying to query multiple years of data, youcan just do simple union

queries to do that. michael manoochehri: exactly. so we support thatas well, yeah. so great. so we talked to you a littlebit about designing these tables and how to ingest data. but let's talk about toolsfor data preparation. what are the actual toolsyou can use to get the data into bigquery?

ryan boyd: yeah, so thereare a variety of different tools here. and it goes all overthe board. a lot of is based off of whatyour experience is. what the source of data is,where is the data being collected, and things likethat, what tools you're going to choose. but we'll give you an idea ofsome of the tools that we've used and some of the tools thatour customers have used

so there's the app enginemapreduce and pipelines api. we're gonna talk about thisa little bit more later. but app engine mapreduce isan implementation of the mapreduce algorithmsin app engine. it's pure app engineimplementation. and we've used it. and we've heard others using itto great success in running sort of large jobs totransform data. and then, of course, thepipelines api, which is

actually what mapreduce isbuilt in part on top of. and the pipelines api basicallyallows you to create a dependency tree of jobs,executions, so kind of a workflow of the variousprocesses that are going into play. and again, runningon app engine. and it's very helpful for doingthings where you we want to process a bunch of files andthen load it into bigquery if the files processedcorrectly.

and we'll see an examplehere shortly. there are also, of course,commercial etl tools. there's an entire industry outthere preparing tools to allow you to prepare your data. and these tools are-- there's a wide varietyof them. the licensing and theway that they're structured is different. some are actually evenopen source models.

but i'm going to talk aboutpervasive a little bit more. and then we also haveinformatica and talend, who have all integrated withbigquery to allow it to be really easy for you to load yourdata into cloud storage and ingest it into bigquery. and then, of course,our favorite-- the unix command-line. michael manoochehri: we'verediscovered the unix command-line.

ryan boyd: yes, rediscoveredthe unix command-line. oftentimes, there are simpleunix command-line tools that you can use to prepareyour data. this is especially great if it'sa one-time load job that you're doing and not somethingthat you're regularly scheduling. these tools are prettyfantastic. and we'll give you some hintsas to how we've used them. and then, of course, if you havexml, use a bit of code.

we don't have too much onthis in the rest of the presentation, but i will saywikipedia provides their revision history. so basically, all the revisionsthat have happened to every single page inwikipedia, about 450 million rows, you can downloadthat as one xml file. and if i'm recalling correctly,i think it was like 130 gigabytes compressedor something. michael manoochehri:pretty massive.

ryan boyd: massive xml file. you're going to want to use astreaming-based parser or a sax-based parser for that xml. the lxml library in python i'vefound to be very helpful for doing that. so check that out. so the commercial side,here's an example. it's a screenshotfrom pervasive's rushanalyzer product.

and we've worked with themto integrate bigquery. and you can see in a very visualfashion and create this in a very visual fashion aworkflow for loading your data so in this case, they'reactually taking data from a on-premise hfs file system, andjoining that data together with data from a mysql databasethat's either in-house or in the cloud, andbringing that all together and joining it even with some,i believe, fat files, and selecting only particular fieldsafter they do that

join, selecting only the fieldsthat they require, and then writing that out tobigquery and using cloud storage as kind of theintermediate step. but-- michael manoochehri: i thinkthese guys are doing that denormalization and flatteningyou described. or we described earlier. that's actually all whatthis joining is, is the denormalization process ofpre-joining the data before

loading it into the bigquery. so it's a great tool for doingthat type of thing. and you can do it in thisvisual style workflow. the other commercial etl tools,the informaticas and talends provide similar styleof creating your workflows. and i know with pervasive, whenyou build that workflow, you can execute it locally. but you can also distribute thejob to actually run it in, say, one of your localon-premise data centers or

even run it in the cloud andhave this job really scale out to a lot of machines runningthis load job and schedule it and that sort of thing. it provides all that power andflexibility but with this nice gui front end to it. so that's pretty powerful. but sometimes you don't needthat level of power. and that's where the unixcommand-line tools come in. ryan boyd: here's a row data,which i do know where this row

of data came from. this is the campaignfinance data. michael manoochehri: ah, yes. ryan boyd: so this is-- the government actuallypublishes-- the us government publishesdonations to various political campaigns. so you can see here thisis a donation that i made in 2008 for $50.

it's in that record. and it's all public data thatthe government's publishing. so one thing that you'llnotice here that i've highlighted on this data isthe year and the date. so 2008-10-19 is when imade this donation. exactly two weeks after mybirthday, apparently, i made this donation for $50. but the thing is, bigquerydoesn't handle date formats very well like this.

i mentioned some of thebasic date formats-- sorry-- the basic field formatsthat bigquery provides in its schema. and it's things like intsand strings and floats. michael manoochehri: booleans. ryan boyd: and booleans. yeah, exactly, so thesevarious formats. but dates currently don't havea special format in bigquery.

we rely on integers. and this is actually quitefamiliar with those unix geeks out there because time sinceepoch, or january 1, 1970, is a very common way to keep trackof time in unix tools. and so we use that inbigquery as well. and you can sort yourdata based off of that time since epoch. and it's really easy onceyou get your data in. but some people have a littlebit of a challenge doing the

conversion to get the data in. michael manoochehri: thisis a very common data conversion issue. ryan boyd: yeah, and somethingwe'll probably resolve in the core product. michael manoochehri: yes. ryan boyd: and there areactually functions in bigquery when you're querying the datato convert that time since epoch back out to a more humanrepresentative of a form.

michael manoochehri:yeah, exactly. ryan boyd: but anyway, sohere's our basic data. you have the timein 2008-10-19. and we want to convert it intoa time that looks like this, which is that timesince epoch. so that's the number of secondssince january 1, 1970. and so to do that conversion isreally, really simple on a command-line tool. and i never would have knownthis without the web.

google is your friend whentrying to do these things. i'm not that big ofan expert in awk. but here is an awk command. now, you'll notice that i'm onmy machine, logged onto my machine here, campaignfinance directory. and what i'm doing first isdoing kind of a streaming unzip from gunzip and pipingthat output to gnu's awk. and then basically what it'sdoing here is replacing the dashes in the date with spacesand then passing that time as

a string time into the mktimefunction to make a time object out of that. and that becomes our int that'sthe time since epoch. and we're substituting column 10of our original source file with that time and printing itall out, and then again, piping it out to gzip tocompress the data again. so there's a reallylarge source file. we can actually only decompressa couple lines at a time, whatever's in the bufferhere, so very powerful tool

with just unix command-line. the other unix command-line toolthat we should talk about that's pretty powerful issomething to deal with that 4-gigabyte chunk that bigqueryallows for currently. so your file size on each filecan be up to 4 gigabytes. and you sometimes want to splityour source file into those 4-gigabyte chunks. and you can do that. let's say you have a source filethat's 12 gigabytes, and

you need this less than 4gigabytes or less than or equal to chunk, youcan use split. and this is how you do it. it's really simple. you're saying split -c. and -cis making sure that it doesn't split on anything other than anew line so you don't get a partial line of data. but then it's writing out 4gigs of data per chunk. and you have your sourcefile here.

and it will just create a seriesof other files, which will be numbered based off oftheir position in that series up to-- in this case, itwould be three separate files with four gigs. michael manoochehri: and i usethis command all the time. what's great about it isit's easy to script. michael manoochehri: so youscript it, and you put it into your ingestion job, andit's really great. ryan boyd: all right.

now i'm going to hand this backover to michael to chat a little about a project thatwe've been working on recently to load 6 terabytes of data or 2terabytes compressed of data michael. so like as ryan said, we've beenworking on this really fun project. this is kind of puttingeverything we just talked about in practice. it's a study in loading 6terabytes of data into

bigquery using app engine. so this has been really great. we're are looking at wikipediapage views. so wikipedia publisheshourly the page views of all the pages. and this is a huge-- since 2008,i think, is the data that we're looking. it starts at the end of 2007,beginning of 2008. ryan boyd: i think weskipped the last

couple of days of 2007. michael manoochehri:ok, you're right. so we don't have all of it. but the data we have is6 terabytes of data. and this is split up intothousands of files of, i think, hundreds ofmegabytes each. i think they're-- compressed,they are something like 100 megs. ryan boyd: anywherebetween 30--

you can actually see from thesize of the files how much wikipedia has grown,even since 2008. michael manoochehri: yes, yes. ryan boyd: because the filesizes increase, i think, from 30 megabytes compressedup to about 100. ryan boyd: i think there werelike 60,000 files or something crazy like that. michael manoochehri: it's anenormous amount of data. ryan boyd: right.

michael manoochehri: so whatwe're doing is using app engine for its ability todistribute tasks across a lot of instances automatically. so what we're actually doing--here's an example of what we're actually doing. the original schema of thesewikipedia page jumps looks something like thison the left side. so there's a language, the titleof the wikipedia page, the amount of page views it'sgotten in the particular hour

that that file pertains to, andthe amount of bytes in the page, which is also great,because you can see how that changes over time as well. so this is a really interestingdata set. what we want to do is take thisdata, this raw data, and put it into a bigquery's tablethat looks like the schema on the right. so what we're doing is we'redoing some of that sort of date/time transformation thatryan talked about earlier.

and we're breaking down yourmonth and day to make our queries simpler. ryan boyd: and on the date/timetransformation, we don't actually show it here. but the files are actually namedbased off of the year, month, day, and hourof the request. so there's one file for everyhour since the end of 2007. and so we're taking the infofrom the file name and actually combining that andjoining that with the rest of

the record that's in the file. and we're actually parsing thatout and the first five fields are based onthat information. and also the wikipedia projectis actually something we pull from the title. that's could be wikipediaitself, and i think there's other wikipedia projects,not the main wikipedia, but other things. so we don't put the languagein there as well.

we parse the title. and then we take the pageviews and the bytes transferred just as integers andput this into the table. ryan boyd: and actually on thedate/time stuff, originally we actually put the date/timeliterally as just a string. and it was actually quite easyto work with as a string if you knew some things aboutregular expressions. you could do the sorts and thesearches and all with the strings because you were usingthe regular expression to

convert it to integersfor that process. but eventually we figured outnot everyone every time they run a query-- even though regular expressionsare super fast, even over hundreds of millionsof rows of data with bigquery, or billions of rows of data withbigquery, not everyone wants to do that regularexpression every time they're running the query. so that's why we did some kindof pre-analysis here and

separated out-- put the date/time since epochand then the year, month, day, and hour just to make it easierto query, even though it would have workedotherwise. michael manoochehri:it's true. it's super convenient. imagine running a queryabout noon. so give me all the pages thatwere looked at at noon. and that's an easy query to dowith this kind of format.

one thing i forgot to mentionis we're also sharding this data, as we mentioned earlier,into months. so every bit of datais sharded into a table of a month. so we can look at themonth itself. and if we want to look at moremonths, we could run a union query with bigquery. we could even look at the entiredata set month after month by just specifying thetable name in our queries.

so it makes it reallyconvenient. i think this is a really goodstrategy for this type of data and this amount of data. but how did we getthe data in? so that's the reallyinteresting part. we decided to use app engine andthe pipelines api, which we mentioned earlier. so what's great about pipelines,as ryan mentioned, is it allows you to build aworkflow and distribute the

work of ingesting all of thesefiles into multiple instances on app engine prettymuch automatically. it lets you just worry aboutthe code itself rather than actually distributing thisacross a bunch of task queues. this has been reallyconvenient. so on the distributed natureof it, it's been fantastic. so we have-- we're basically breaking thisdown by month and only processing a monthat the time.

and depending on the number ofdays in a given month, 720 to 740 files for every month, andwe're distributing that across, i believe in our latestconfiguration, about 100 app engine instances tohave all of those 100 app engine instances process thisdata simultaneously. and it just speeds this up towhat would have been a process that i probably wouldtake like half a year on my desktop. ryan boyd: we can haveapp engine do it in

a much faster way. michael manoochehri: thepipelines api is great too because it provides a sort ofcontrol panel, or status page actually, of what's going on. so you can actually follow theworkflow as it's happening and kind of check on the status ofindividual, let's say, leaves on the workflow tree and seewhat each one is doing. so it's been reallyconvenient. ryan boyd: yeah, you can seehere that the children listed

on the bottom right here, that'sthe children of this process month. and we can clickon any of them. we actually did. you'll see some of these. i don't know if you cansee it in the video. it might be a littletoo small. but some of these are callingpipelines called process file pipeline.

and that's actually doing thefull transformation and processing the file. but some of the otherpipelines are called return file name. that's in the case that wealready processed the file and are just trying to make surethat it gets included in the upcoming ingestion job. so we kept track of the thingsthat we were doing and stored that in the app enginedata store so that

when we actually do-- if we do have an error, if oneof the files is bad, or something like that,we can rerun this without any problems. michael manoochehri: yeah,and it's been great. the other thing is sometimeswhen you're dealing with this many files, you may have acorrupted file in the source. there's all kinds of thingsthat can happen. so adding this kind of check onhave you processed the file

is a really goodbest practice. ryan boyd: yeah, so these arebasically just broke up by month, processed all the filesin that given month, and then went off and called a bigquery ingestion job. and our queues-- this is built on topof the task queues. our queues for calling bigqueryingestion, basically we're configuring them such thatwe're only running the two ingestion job simultaneouslyon bigquery to

go by that quota limit. and we can-- we're running those twosimultaneously. but in the case sometimes thatwe mess up, we're actually detecting failures and rerunningthe task in case there are any failures. michael manoochehri: andryan, remind me. i think each month has somethinglike 720 files. ryan boyd: yeah, that's right.

michael manoochehri: so toadjust an entire month we actually split that into twobatches of like 350 or something like that of filesin multiple batches. so it's a great way toget most of that. we can almost ingest an entiremonth's in one batch, which is pretty remarkable. it's quite a lot of data. so i think that's really aboutit for today's episode. you've seen michael and ryan'stips for how to work with

bigquery and getting yourdata into bigquery. and hopefully this helpsyou a little bit as you load your own data. let's give a quick overview ofwhat we discussed today, and then we'll get into yourquestions that you have on the google moderator. and so we've talked aboutdenormalization being your friend, talked aboutsharding your data. if you need to transform yourdata, you can use a variety of

different tools-- commercialtools, app engine, or your favorite unix commands. ryan boyd: and now it's timefor your questions. let's see what questions existover on our google moderator. and it actually looks like wehave one person live on the hangout as well. so if the person that's live onthe hangout can introduce themselves and talk a little bitabout what they're doing with bigquery or what they'dlike to do with bigquery.

and then feel free to ask anyquestions that you have. hello? i believe you're mutedright now. ryan boyd: there we go. i think we're hearing you now. michael manoochehri:all right. how's it going? good to see you. male speaker: great.

yeah, so i'm the personin the hangout then. and actually, i'm notdoing currently anything with big data. i'm fascinated by it. i'm working with qualitativedata for my phd thesis, which is badly structured. basically, you're doing textanalysis and things like that. and i find it inspiring whatcan be done with this processing and analysisof date.

now, the amount of data whichi have i could possibly process on any recentmobile phone. but there are-- in my other life, i'm workingas a project manager in data center project management. and i find it intriguing toapply some principles which are used to analyze qualitativedata with all the data that is available in allthose files loading and whatever information inlarger organizations.

so it's a form of data mining,but which escapes the typical data mining situation. bigquery's really great for thiskind of taking data out of silos, joining them, andthen running aggregates. we actually have some customersdoing this. this is a great use case becauseit handles such a huge amount of data. it's sometimes the only toolthat can handle this in an ad-hoc way to let yourun these queries

in an ad-hoc way. we actually have a colleagueof ours, katherine. michael, katherine, and i areall speaking at upcoming strata conferences betweenlondon and new york. and katherine's workingon preparing her talk. and she's actually, i believe,keynoting at the strata in london. and she's really looking at,from a data journalism perspective, how to sourcea bunch of data.

a lot of the data,you actually-- you're looking at a subset ofdata and trying to come up with a bigger meaning outof that subset of data. and you really-- a lot of the work is reallytrying to find all the data that would possibly explain thatsmall subset that you're looking at. and it's a lot of hard times. but hopefully, some of the tipsthat we gave you here

today will at least, once youcan get your hands on that data, help you prepare it andkind of denormalize it and get it into bigquery if you guysdecide to do that. michael manoochehri: this usecase you brought up of text analysis is reallyinteresting. we actually do some of thaton our team as well. one thing to know about bigqueryis the maximum record size is 64k for anysingle record. however, it's great for thingslike n-gram analysis or word

count analysis. so this is the kind of thingwhere maybe a mapreduce job to find, say, word counts perdocument and then put all of that data into bigquery. that's a really good use ofbigquery, especially when you have tons of data. and we actually have someexample data sets doing that already, like n-gram data setand then a wikipedia data set. ryan boyd: i think then-gram is from

google books, i believe. sorry. not wikipedia. n-gram and-- michael manoochehri:shakespeare. ryan boyd: and shakespeare. ryan boyd: yes. so we have some of the exampledata sets doing that. we're actually working with thewikipedia stuff soon to

basically take all of the textin wikipedia and make that into n-grams just to see whatwords are most commonly appearing, when words firstappear, and things like that. some fun analysison text data. one last thing is the otherthing that's really great about bigquery is thatit's an api. and it's really easy tointegrate into, say, web applications. any kind of application,actually.

you ask questions, sendqueries to it via api. we also have tools. we have a browser tool anda command-line tool. but you can actually write codewith this very simply. so it's all kind of built intogether with the api. so it makes applicationprocessing super easy. you can build javascript appsin just a few lines of code. you can use app engine. and you can build a verypowerful application in very

little code. so it's great for some of thethings you're talking about. male speaker: where is the bestplace to get started? so i'm more a casual programmer,if at all. so i can do some stuff in pythonor java or processing to do little things,visualizations, but it's-- this is really oldschool, yes? what i'm doing. so if i want to go into the bigdata examples or just fool

around with it, wheredo i start? is there any place where you canget a step-by-step example to get you started? in bigquery, i think i wouldstart out in bigquery, i believe we have a step-by-stepgetting started guide to working on some of thepublic data samples. ryan boyd: so the public datasamples in bigquery, you can actually run queries againstup to 100 gigabytes a day worth of queries for freewithout putting down your

credit card. so you can start by looking atsome of that existing data and not deal with the process ofloading your own data. but you can see what types ofqueries that you can perform. i've had a lot of funlooking at the natality data, for instance. that's us birth statisticssince 1969, i believe. and basically there's a line forevery single birth in the us that's occurred inthat time period.

and you have some interestingthings like in ohio in 2003 if you look at the babies bornthere to mothers who smoked cigarettes versus didn't smokecigarettes, the babies born to mothers who smoked cigaretteswere half a pound lighter than the babies who had mothers thatdidn't smoke cigarettes. and you find interestingbits of data. and that's actually whatbigquery is really great for, is this ad-hoc analysis in anaggregate fashion on large chunks of data.

so i would try that out, justplay around with that data. in terms of visualization, youcan actually see some of the demos from clickview and beam,who have built visualizations already on top ofthat data set. and then use something likegoogle apps script. in about 70 lines of code, youcan visualize data that comes from bigquery queries in googleapps script inside a spreadsheet. and then do graphs and pivottables and that sort of thing.

and if you're a friend of excelinstead, we have the big great connector for excel thatwe just launched a week or two ago that provides similarfunctionality and a really easy way to get that. so you don't really need codingexperience right now. we aim ourselves as an api withbigquery for developers. but to get started in tryingit out, you don't actually need heavyweight codingexperience. so on the bigquery side,i would check some

of that stuff out. app engine mapreduce, i believe,has a getting started like word count sort ofmapreduce that you can run. that's going to be a little bitheavier coding a little bit heavier algorithmsunderstanding what's going on there. but hopefully, between thosetwo data sources. so in bigquery just go todevelopers.google.com/bigquery. and there's a gettingstarted guide there.

and you can try it out. michael manoochehri: yeah, theone you mentioned, the browser tool, so we have the bigquerybrowser tool, that's all through the web browser. and you can run some ofthese queries for free by signing up. so, yeah, just go to the gettingstarted guide on developers.google.com/bigquery. ryan boyd: yep.

michael manoochehri: that's thebest place to get started. male speaker: great,great, great. ryan boyd: well, thankyou for stopping by. it doesn't look like we actuallyhave any other questions in the moderatorcurrently. so i think the ryan andmichael show is about ready to sign off. but thank you all for joining. it's been a pleasure.

and hopefully, we'llcontinue this. and let us know your feedbackon google+. let us know your technicalquestions on stack overflow. and we're excited tokeep on seeing how you're using bigquery. thanks. bye, everyone. ryan boyd: goodbye. male speaker: bye.

Thus articles standard office furniture dimensions

A few standard office furniture dimensions, hopefully can provide benefits to all of you. Okay, so this time the post furniture stands..

You're reading an article standard office furniture dimensions and this article is a url permalink https://furniturestands.blogspot.com/2018/05/standard-office-furniture-dimensions.html Hopefully this article This could be useful.

standard office furniture dimensions

baca juga

standard office furniture dimensions

Thus articles standard office furniture dimensions